Next-Generation Sequencing data analysis

You are new to NGS and have no idea of how to analyze your data? Here you can find events, discuss experiments and get in touch with experts

David Langenberger The Low Quality Of Scientific Code
That's the harsh truth, but how can this problem be solved? Can it be solves? What's your experience?
On the one hand, scientific code is produced by scientists, who are mostly interested in achieving high quality of their scientific publications. The quality is measured in the currency of the domain -- e.g. new cancer drug targets that guarantee good publications. The code is a mere means to get to the signal in the data but is not recognized as product by itself. The situation is promoted by languages like AWK and Bash that make it easy to produce unstructured spaghetti code but are just unbeatably efficient for solving certain simple tasks. Structured programming is impossible if even the concept of a function is unknown or forgotten (which is a real life example of mine).
On the other hand, scientific code is often highly tested on huge and diverse datasets. It is robust to the input data and things like computer node failure. Unfortunately, the tests are not repeatable and the robustness stems from code adaptation to solve immediate problems ad hoc (to meet some deadline or satisfy the PI).
Moving such code from science to production, e.g. to serve in a wider community, is at best hard. And there is simply no money for scientific programmers to re-programm everything. The deciders up (or in particular) at the highest level do not value software as a product of science and do rather invest their money in hardware that outdates quickly than in supporting people and projects -- something that would be needed to produce quality software.


