Menu

Overview of the current literature

Software development practices in bioinformatics vary widely, with numerous available suggestions on how to start writing scientific code or improve development practices. Their usefulness often depends on prior training in computer science and specific scientific research fields. Interestingly, scholars with backgrounds in computer science and software development also regularly share their perspectives on the state of scientific software in bioinformatics and scientific computing in general in their respective journals. Meanwhile, bioinformaticians themselves (in bioinformatics journals) offer guidelines for best practices in software development. We examined the existing literature on these two aspects to better understand what practices are currently encouraged and how they are implemented. We aim to provide insight into tackling challenges of understanding, prioritization, and adopting good practices.

Bioinformatics papers providing recommendations are plentiful and often share common themes. They may focus on specific suggestions, called rules or "tips and tricks," or broadly address good coding and data analysis practices, compiled into "guidelines." Other popular resources include online learning platforms, peer blog posts, open-source university lecture materials, and forums. These resources are crucial for bioinformaticians who are self-taught programmers.

However, prioritizing these guidelines and selecting and adapting the most fitting practices are challenging tasks. Some are easier for those with prior training in computer science or in specific scientific research fields. However, the suggested practices do not always align with mainstream software standards. Indeed, developing scientific software raises unique challenges that require adopting mainstream practices by carefully altering them to fit our needs while keeping their essence.

To illustrate and discuss the current status of the literature, we selected various articles as entry points for bioinformaticians looking to enhance their programming skills. Their suggestions are summarized in the table below. Our first insight is that the sheer amount of practices can be numbing for a trainee with minimal coding experience. Undoubtedly, all of these practices help reduce the cost of development, improve reproducibility, or enable better science. However, adopting them by an individual researcher might take so long that their benefit cannot be reaped within the time frame of the position. Indeed, these articles often target early career researchers with minimal coding experience, so they frequently recommend both entry-level coding standards (see Software development 101 in the table below) and advanced software solutions (see Reproducibility in the table below).

Judging by the recurrence of the same themes across the years of publishing these papers, there are not too many successful ways of tackling this problem: too many concepts to follow, too little time to select the most fitting ones, let alone learning them. Inspired by software engineers, we asked whether teamwork would lower the bar to learn good practices collectively. When directly looking for team efforts in academic software development, we found that the literature for bioinformaticians rarely covers coding as a team or leveraging the expertise of multiple bioinformaticians in software development. Beginners are typically advised to seek help when encountering problems rather than collaborating. This approach includes consulting colleagues, finding mentors, or participating in online communities like Stack Overflow or Biostars (10.1371/journal.pcbi.1008645). However, the framework primarily emphasizes individual practices and specific scientific issues, insufficiently addressing team coding practices highlighted in software engineering literature (Ko2024, 10.5334/jors.35). The one exception of collectively improving coding practices at a team level within the bioinformatics community is the Code Clubs described by Hagan et al (10.1371/journal.pcbi.1008119). These clubs encourage collective software development through code reviews, pair programming, and educational workshops or seminars.Similarly to Hagan et al, our experience indicates that working collectively towards better software engineering proficiency increases the likelihood of adopting critical practices.

In summary, the journey towards better software development practices in bioinformatics is ongoing and multifaceted. It involves a combination of individual learning, collaborative efforts, and integrating advanced tools and methodologies to enhance the quality and reliability of scientific software.

Recommendations on how to improve software development

Recommendation References
Software development 101
Sanity check on input parameters 10.1371/journal.pcbi.1005412
Do not hard-code changeable parameters and paths 10.1371/journal.pcbi.1005412
Do not require superuser privileges for installation and usage 10.1371/journal.pcbi.1005412
Advanced software development
Usage of design patterns 10.1016/j.jss.2020.110848
Adoption of international best practice standards of software quality 10.5281/zenodo.1172970
Regular refactoring 10.1016/j.jss.2020.110848
Software development process
Continuous integration 10.1016/j.jss.2020.110848, 10.1093/bib/bbw134
Agile software development methodology 10.7717/peerj-cs.839, 10.1016/j.jss.2020.110848
Educated choice of software development methodology 10.1109/CSEET.2009.44
Independent review of source code 10.1109/CSEET.2009.44, 10.5281/zenodo.1172970, 10.1109/MIC.2014.88
Code quality monitoring 10.1016/j.jss.2020.110848
Inclusion of appropriate license 10.5281/zenodo.1172970, 10.1371/journal.pcbi.1005412
Cooperation between developers and users 10.5281/zenodo.1172970
Testing and validation
Establish validation and acceptance procedures 10.5281/zenodo.1172970, 10.1016/j.jmoldx.2017.11.003
Provide a small test set 10.1371/journal.pcbi.1005412
Standardized tests 10.1371/journal.pcbi.1005412, 10.1016/j.jss.2020.110848, 10.7717/peerj-cs.839, 10.5281/zenodo.1172970, 10.1109/CSEET.2009.44, 10.1371/journal.pcbi.1008645
Ensure reproducibility of results 10.1371/journal.pcbi.1005412
Reproducibility
Standardized working environment and automation 10.7717/peerj-cs.839, 10.5281/zenodo.1172970
Version control 10.1371/journal.pcbi.1005412, 10.1016/j.jss.2020.110848, 10.5281/zenodo.1172970, 10.7717/peerj-cs.839, 10.7287/peerj.preprints.2996
Rely on package managers 10.1371/journal.pcbi.1005412
Containerization for portability 10.7717/peerj-cs.839, 10.1371/journal.pcbi.1005412
Tagging of software version for reproducibility 10.1371/journal.pcbi.1005412
Documentation
User (and developer) documentation 10.7717/peerj-cs.839, 10.5281/zenodo.1172970, 10.1371/journal.pcbi.1005412, 10.1109/CSEET.2009.44, 10.1093/bib/bbw134
Requirements gathering 10.7717/peerj-cs.839, 10.1109/CSEET.2009.44
Description of the software version used, its configurations, and parameters in publications 10.5281/zenodo.1172970
Community effort
Contribute to open-source development 10.1109/MIC.2014.88
Reuse existing (reliable) software 10.1016/j.jss.2020.110848, 10.1371/journal.pcbi.1005412
Preferentially selecting freely available open-source software 10.5281/zenodo.1172970
Encourage user participation in the software development process 10.5281/zenodo.1172970
Recognition and assignment of adequate time for quality-assured development 10.5281/zenodo.1172970, 10.1109/CSEET.2009.44
Recognition of software development as academic achievement 10.5281/zenodo.1172970, 10.1109/MIC.2014.88
Support for developer community for long-term maintenance (when applicable) 10.5281/zenodo.1172970, 10.1109/MIC.2014.88
Financial support for software development and maintenance 10.1109/MIC.2014.88, 10.5281/zenodo.1172970