Menu

Dependency management

A fundamental aspect of bioinformatics software development involves integrating functionalities from various software packages, which refer to collections of code that perform specific tasks. Given the large number of dependencies and the entire ecosystem of tools involved in scientific software development, managing these dependencies is crucial for ensuring the reproducibility of findings.

In a team setting, it is natural to create identical environments for all developers to ensure they can run the code identically. In the software quality seminars related to dependency management, we covered the following topics: container solutions (Docker, Apptainer/), R package development, and Anaconda. We established a DockerHub account for our group to publish our custom containers. This resource enables, for example, easy installation of our Snakemake pipelines across different servers.

There are two angles of dependency management, which we give examples of here. First, we share a previous and current version of code where the placement of the package imports is improved. This code can also be seen as an example of modularization, with rearranging the linear script to set up and functions. Furthermore, we improved the documentation and usability by using named arguments instead of positional ones.

code_previous
Figure 1. An example for dependency management within the code: PREVIOUS.
code_after
Figure 2. An example for dependency management within the code: CURRENT.

Second, we share an example of documenting the requirements where the responsibility of installing the software is moved from the user to the developer. README-based solution: the user must install the dependencies, and version and source might be given, but compatibility following updates is not ensured.


    ## Installation

    - R (version >= 3.6.1)
    - CAGEr (version >= 2.6.1)
        ## for installation follow the instructions here
        ## [https://bioconductor.org/packages/release/bioc/vignettes/CAGEr/inst/doc/CAGEexp.html#normalization]
    - BSgenome.Hsapiens.UCSC.hg38
    - tidyr
    - viridis
    - ggplot2

Container-based solution: the user can either use the publicly available container that includes a snapshot of all necessary requirements, or build their own environment.


    ## Installation

    Container available at https://hub.docker.com/r/cbgr/cager261
    For details, refer to requirements.R

The content of requirements.R is shown below.


    ## Container folder structure
    .libPaths( c( "/opt/software" , .libPaths() ) )

    ## CRAN packages:
    packages_cran = c(
        "optparse",     	## Read in data
        "tidyr",            ## Data formatting and manipulation
        "ggplot2",       	## Plotting
        [...])
    message(
        "; Installing these R packages from CRAN repository:",
        packages_cran)
    install.packages(
        packages_cran,
        repos="https://cran.uib.no/",
        lib="/opt/software")
    
    ## Bioconductor packages:
    packages_bioconductor <- c("BSgenome.Hsapiens.UCSC.hg38")
    message(
        "; Installing these R Bioconductor packages: ",
        packages_bioconductor)
    BiocManager::install(
        packages_bioconductor,
        lib="/opt/software")

The content of the Dockerfile is shown below.


    # Docker install R 4.3, Bioconductor 3.17
    FROM bioconductor/bioconductor_docker:3.17
    
    # Set up folder structure
    WORKDIR /opt/software
    
    # Install CAGEr 2.6.1
    RUN R -e 'BiocManager::install("CAGEr")'
    
    # Install other R dependencies
    COPY requirements.R /opt/software/requirements.R
    RUN Rscript requirements.R
    ENV R_LIBS=${R_LIBS}:/opt/software