A2: Adapting Genomic Data Analysis Workflows for Different Data Access Patterns

Description

DAW runtime in distributed infrastructures is often dominated by the time required for data access and data exchange (DADE), which in turn depends on the data being analyzed, the tasks being executed, and the infrastructure on which a DAW runs. Changes in either of these aspects can quickly lead to deteriorating runtimes when a DAW is not adapted properly. Subproject A2 investigates methods that can adapt a given DAW to new input data or a different infrastructure with the goal to keep runtime low. A2 is an interdisciplinary project; it will develop its research using DAWs for large-scale genome data analysis, which are typically IO heavy and thus particularly depend on proper DADE operations. It will intensively cooperate with subproject A6 by testing its newly developed methods also on DAWs for finding structural genomic variations, and it will use the hardware abstractions developed in B1. It will be carried out by Prof. Reinert, an expert in data structures and algorithms for genomic data, and Prof. Leser, an expert in optimization of UDF-heavy DAWs.

PIs

Publications

2022

Marcus Hilbrich; Sebastian Müller; Svetlana Kulagina; Christopher Lazik; Ninon De Mecquenem; Lars Grunske

A Consolidated View on Specification Languages for Data Analysis Workflows Inproceedings

In: Margaria, Tiziana; Steffen, Bernhard (Ed.): Leveraging Applications of Formal Methods, Verification and Validation. Software Engineering, pp. 201–215, Springer Nature Switzerland, Cham, 2022, ISBN: 978-3-031-19756-7.

Abstract | BibTeX

2020

Christopher Schiefer; Marc Bux; Jörgen Brandt; Clemens Messerschmidt; Knut Reinert; Dieter Beule; Ulf Leser

Portability of Scientific Workflows in NGS Data Analysis: A Case Study Journal Article

In: CoRR, vol. abs/2006.03104, 2020.

Links | BibTeX