Modularization
Modular design is one of the most common approaches for modern programming, ensuring the maintainability and extensibility of a software product. Over time, the software may become increasingly challenging to maintain (10.1109/CSEET.2009.44). One can address this increased complexity through refactoring with modularization in mind. This means continuously monitoring the code, recognizing a code that grew too much, and re-structuring it into smaller parts. It involves an understanding that the code is not a static entity but an ever-growing, ever-changing organism.
Based on the team's guidelines and experience, we understood that moving from unstructured scripts to organized code with functions brings several benefits at a low cost. Consequently, this topic was covered several times during our software quality seminars and code review sessions. Specifically, we dedicated software quality seminars to the following topics to improve modularization: object-oriented programming, class diagrams and unified modeling language in general, design patterns, software architecture, Snakemake, S4 objects, R package development, a case report from the organization of the JASPAR database project, and a review of the book titled The Pragmatic Programmer. Modularization can take form on many levels. On a small scale, it means naming and organizing parts of the code into functions. Once a code grows, one can start refactoring into classes and focus on the coherence and coupling of the parts (Figure 1-2). When building a pipeline of scripts, one can identify coherent modules that would translate to rules in Snakemake (Figure 3-4). A recurring question is whether a script needs refactoring or can remain a prototype. Taschuk and Wilson suggest a cut-off at which one reuses a script, shares it with others, or uses it to produce findings in a publication. Although this definition potentially includes most code written by bioinformaticians, we suggest weighing the time spent improving the scripts against the time required to deal with sub-optimal code on a case-by-case basis. Modularization becomes the norm with practice and exposure to a lot of code, reducing the distance between a prototype and the refactored code.