Scalable software and reproducible workflows for data analysis

MultiOmics data sets are highly complex and, with improved instrumentation increasingly large in size. The software to analyze such data sets needs to be adaptable to experiments and data setups as well as scalable. Commercial software often acts as a black box – users put data in and get results without much control of the internal processing or understanding the underlying algorithms. Also, results, in particular with omics data, are hard to reproduce, in most cases because of a lack of proper reporting, unreliable software or adherence to data analysis standards.

To overcome these issues, the Centre’s bioinformaticians actively develop and optimize opensource software within the Bioconductor project, a global repository of software packages for genome-scale data. A complete ecosystem of software packages for efficient and scalable data analysis is collaboratively developed with international partners following an open software development strategy. These software libraries were downloaded by over 190000 different users in 2025. In addition, also based on a community effort, standards and workflows for reproducible and transparent data analysis are being developed.

By prioritizing open science, we established software solutions and data analysis workflows guaranteeing FAIR and transparent research, ultimately leading to robust and reproducible results.