Overview of the current literature
Software development practices in bioinformatics vary widely, with numerous available suggestions on how to start writing scientific code or improve development practices. Their usefulness often depends on prior training in computer science and specific scientific research fields. Interestingly, scholars with backgrounds in computer science and software development also regularly share their perspectives on the state of scientific software in bioinformatics and scientific computing in general in their respective journals. Meanwhile, bioinformaticians themselves (in bioinformatics journals) offer guidelines for best practices in software development. We examined the existing literature on these two aspects to better understand what practices are currently encouraged and how they are implemented. We aim to provide insight into tackling challenges of understanding, prioritization, and adopting good practices.
Bioinformatics papers providing recommendations are plentiful and often share common themes. They may focus on specific suggestions, called rules or "tips and tricks," or broadly address good coding and data analysis practices, compiled into "guidelines." Other popular resources include online learning platforms, peer blog posts, open-source university lecture materials, and forums. These resources are crucial for bioinformaticians who are self-taught programmers.
However, prioritizing these guidelines and selecting and adapting the most fitting practices are challenging tasks. Some are easier for those with prior training in computer science or in specific scientific research fields. However, the suggested practices do not always align with mainstream software standards. Indeed, developing scientific software raises unique challenges that require adopting mainstream practices by carefully altering them to fit our needs while keeping their essence.
To illustrate and discuss the current status of the literature, we selected various articles as entry points for bioinformaticians looking to enhance their programming skills. Their suggestions are summarized in the table below. Our first insight is that the sheer amount of practices can be numbing for a trainee with minimal coding experience. Undoubtedly, all of these practices help reduce the cost of development, improve reproducibility, or enable better science. However, adopting them by an individual researcher might take so long that their benefit cannot be reaped within the time frame of the position. Indeed, these articles often target early career researchers with minimal coding experience, so they frequently recommend both entry-level coding standards (see Software development 101 in the table below) and advanced software solutions (see Reproducibility in the table below).
Judging by the recurrence of the same themes across the years of publishing these papers, there are not too many successful ways of tackling this problem: too many concepts to follow, too little time to select the most fitting ones, let alone learning them. Inspired by software engineers, we asked whether teamwork would lower the bar to learn good practices collectively. When directly looking for team efforts in academic software development, we found that the literature for bioinformaticians rarely covers coding as a team or leveraging the expertise of multiple bioinformaticians in software development. Beginners are typically advised to seek help when encountering problems rather than collaborating. This approach includes consulting colleagues, finding mentors, or participating in online communities like Stack Overflow or Biostars (10.1371/journal.pcbi.1008645). However, the framework primarily emphasizes individual practices and specific scientific issues, insufficiently addressing team coding practices highlighted in software engineering literature (Ko2024, 10.5334/jors.35). The one exception of collectively improving coding practices at a team level within the bioinformatics community is the Code Clubs described by Hagan et al (10.1371/journal.pcbi.1008119). These clubs encourage collective software development through code reviews, pair programming, and educational workshops or seminars.Similarly to Hagan et al, our experience indicates that working collectively towards better software engineering proficiency increases the likelihood of adopting critical practices.
In summary, the journey towards better software development practices in bioinformatics is ongoing and multifaceted. It involves a combination of individual learning, collaborative efforts, and integrating advanced tools and methodologies to enhance the quality and reliability of scientific software.
Recommendations on how to improve software development
Recommendation | References |
---|---|
Software development 101 | |
Sanity check on input parameters | 10.1371/journal.pcbi.1005412 |
Do not hard-code changeable parameters and paths | 10.1371/journal.pcbi.1005412 |
Do not require superuser privileges for installation and usage | 10.1371/journal.pcbi.1005412 |
Advanced software development | |
Usage of design patterns | 10.1016/j.jss.2020.110848 |
Adoption of international best practice standards of software quality | 10.5281/zenodo.1172970 |
Regular refactoring | 10.1016/j.jss.2020.110848 |
Software development process | |
Continuous integration | 10.1016/j.jss.2020.110848, 10.1093/bib/bbw134 |
Agile software development methodology | 10.7717/peerj-cs.839, 10.1016/j.jss.2020.110848 |
Educated choice of software development methodology | 10.1109/CSEET.2009.44 |
Independent review of source code | 10.1109/CSEET.2009.44, 10.5281/zenodo.1172970, 10.1109/MIC.2014.88 |
Code quality monitoring | 10.1016/j.jss.2020.110848 |
Inclusion of appropriate license | 10.5281/zenodo.1172970, 10.1371/journal.pcbi.1005412 |
Cooperation between developers and users | 10.5281/zenodo.1172970 |
Testing and validation | |
Establish validation and acceptance procedures | 10.5281/zenodo.1172970, 10.1016/j.jmoldx.2017.11.003 |
Provide a small test set | 10.1371/journal.pcbi.1005412 |
Standardized tests | 10.1371/journal.pcbi.1005412, 10.1016/j.jss.2020.110848, 10.7717/peerj-cs.839, 10.5281/zenodo.1172970, 10.1109/CSEET.2009.44, 10.1371/journal.pcbi.1008645 |
Ensure reproducibility of results | 10.1371/journal.pcbi.1005412 |
Reproducibility | |
Standardized working environment and automation | 10.7717/peerj-cs.839, 10.5281/zenodo.1172970 |
Version control | 10.1371/journal.pcbi.1005412, 10.1016/j.jss.2020.110848, 10.5281/zenodo.1172970, 10.7717/peerj-cs.839, 10.7287/peerj.preprints.2996 |
Rely on package managers | 10.1371/journal.pcbi.1005412 |
Containerization for portability | 10.7717/peerj-cs.839, 10.1371/journal.pcbi.1005412 |
Tagging of software version for reproducibility | 10.1371/journal.pcbi.1005412 |
Documentation | |
User (and developer) documentation | 10.7717/peerj-cs.839, 10.5281/zenodo.1172970, 10.1371/journal.pcbi.1005412, 10.1109/CSEET.2009.44, 10.1093/bib/bbw134 |
Requirements gathering | 10.7717/peerj-cs.839, 10.1109/CSEET.2009.44 |
Description of the software version used, its configurations, and parameters in publications | 10.5281/zenodo.1172970 |
Community effort | |
Contribute to open-source development | 10.1109/MIC.2014.88 |
Reuse existing (reliable) software | 10.1016/j.jss.2020.110848, 10.1371/journal.pcbi.1005412 |
Preferentially selecting freely available open-source software | 10.5281/zenodo.1172970 |
Encourage user participation in the software development process | 10.5281/zenodo.1172970 |
Recognition and assignment of adequate time for quality-assured development | 10.5281/zenodo.1172970, 10.1109/CSEET.2009.44 |
Recognition of software development as academic achievement | 10.5281/zenodo.1172970, 10.1109/MIC.2014.88 |
Support for developer community for long-term maintenance (when applicable) | 10.5281/zenodo.1172970, 10.1109/MIC.2014.88 |
Financial support for software development and maintenance | 10.1109/MIC.2014.88, 10.5281/zenodo.1172970 |