B1: Scheduling and Adaptive Execution of Data Analysis Workflows across Heterogeneous Infrastructures


The efficient implementation of complex DAWs in various scientific disciplines requires deep knowledge of a large stack – consisting of an abstract DAW description, compilation of a logical plan, mapping onto the currently available infrastructure, and appropriate configuration of execution engines. Components and configurations developed for one computational infrastructure are often unsuitable for another, either leading to an undesirable platform lock-in or to a considerable loss of efficiency.

The goal of subproject B1 is therefore to improve portability. To this end, we

  • compare DAW requirements with declarative descriptions of the available infrastructure,
  • profile both DAWs and infrastructure as needed, and
  • then map the DAWs onto the infrastructure using novel scheduling and load balancing (SLB) techniques to automatically optimize efficiency.

Ultimately, we aim to allow scientists to focus on the domain-specific challenges in their DAWs, while our new components provide an efficient selection and use of the available computing infrastructure automatically.





Jonathan Bader; Fabian Lehmann; Lauritz Thamsen; Jonathan Will; Ulf Leser; Odej Kao

