B1: Scheduling and Adaptive Execution of Data Analysis Workflows across Heterogeneous Infrastructures

Description

The efficient implementation of complex DAWs in various scientific disciplines requires deep knowledge of a large stack – consisting of an abstract DAW description, compilation of a logical plan, mapping onto the currently available infrastructure, and appropriate configuration of execution engines. Components and configurations developed for one computational infrastructure are often unsuitable for another, either leading to an undesirable platform lock-in or to a considerable loss of efficiency.

The goal of subproject B1 is therefore to improve portability. To this end, we

  • compare DAW requirements with declarative descriptions of the available infrastructure,
  • profile both DAWs and infrastructure as needed, and
  • then map the DAWs onto the infrastructure using novel scheduling and load balancing (SLB) techniques to automatically optimize efficiency.

Ultimately, we aim to allow scientists to focus on the domain-specific challenges in their DAWs, while our new components provide an efficient selection and use of the available computing infrastructure automatically.

B1

PIs

Publications

15 entries « 1 of 2 »

2022

Jonathan Will; Lauritz Thamsen; Jonathan Bader; Dominik Scheinert; Odej Kao

Ruya: Memory-Aware Iterative Optimization of Cluster Configurations for Big Data Processing Inproceedings

In: 2022 IEEE International Conference on Big Data (IEEE BigData 2022), IEEE, 2022.

Links | BibTeX

Dominik Scheinert; Soeren Becker; Jonathan Bader; Lauritz Thamsen; Jonathan Will; Odej Kao

Perona: Robust Infrastructure Fingerprinting for Resource-Efficient Big Data Analytics Inproceedings

In: 2022 IEEE International Conference on Big Data (IEEE BigData 2022), IEEE, 2022.

Links | BibTeX

Jonathan Bader; Nicolas Zunker; Soeren Becker; Odej Kao

Leveraging Reinforcement Learning for Task Resource Allocation in Scientific Workflows Inproceedings

In: 2022 IEEE International Conference on Big Data (IEEE BigData 2022), IEEE, 2022.

Links | BibTeX

Jonathan Bader; Joel Witzke; Soeren Becker; Ansgar Lößer; Fabian Lehmann; Leon Doehler; Anh Duc Vu; Odej Kao

Towards Advanced Monitoring for Scientific Workflows Inproceedings

In: 2022 IEEE International Conference on Big Data (IEEE BigData 2022), IEEE, 2022.

Links | BibTeX

Jonathan Will; Lauritz Thamsen; Jonathan Bader; Dominik Scheinert; Odej Kao

Get Your Memory Right: The Crispy Resource Allocation Assistant for Large-Scale Data Processing Inproceedings

In: 2022 IEEE International Conference on Cloud Engineering (IC2E), IEEE, 2022.

Links | BibTeX

Jonathan Bader; Kevin Styp-Rekowski; Leon Doehler; Soeren Becker; Odej Kao

Macaw: The Machine Learning Magnetometer Calibration Workflow Inproceedings

In: 2022 International Conference on Data Mining Workshops (ICDMW), IEEE, 2022.

Links | BibTeX

Jonathan Bader; Fabian Lehmann; Alexander Groth; Lauritz Thamsen; Dominik Scheinert; Jonathan Will; Ulf Leser; Odej Kao

Reshi: Recommending Resources for Scientific Workflow Tasks on Heterogeneous Infrastructures Inproceedings

In: 41th International Performance Computing and Communications Conference 2022, IEEE, 2022.

Links | BibTeX

Jonathan Bader; Fabian Lehmann; Lauritz Thamsen; Jonathan Will; Ulf Leser; Odej Kao

Lotaru: Locally Estimating Runtimes of Scientific Workflow Tasks in Heterogeneous Clusters Inproceedings

In: 34th International Conference on Scientific and Statistical Database Management (SSDBM 2022), pp. 1–12, ACM, 2022.

Links | BibTeX

Lauritz Thamsen; Dominik Scheinert; Jonathan Will; Jonathan Bader; Odej Kao

Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview Journal Article

In: Datenbank-Spektrum, vol. 22, pp. 143–151, 2022.

Abstract | Links | BibTeX

Dominik Scheinert; Alireza Alamgiralem; Jonathan Bader; Jonathan Will; Thorsten Wittkopp; Lauritz Thamsen

On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds Inproceedings

In: 2021 IEEE International Conference on Big Data (Big Data), pp. 3113-3118, 2022.

Links | BibTeX

15 entries « 1 of 2 »