FONDA PhD Defense: Jonathan Bader on “Task Resource Prediction for Efficient Execution of Scientific Workflows”

Jonathan Bader defended his doctoral dissertation “Task Resource Prediction for Efficient Execution of Scientific Workflows” with distinction on June 4th, 2025. He is a member of the group “Distributed and Operating Systems” at TU Berlin, where he worked on FONDA subproject B1. His work focuses on predicting which tasks in a workflow are most resource intensive in order to dynamically adjust resource allocation and scheduling.

As part of this research, he introduced Lotaru and Sizey, two novel methods for predicting task run-time and memory requirements, respectively. Lotaru allows researchers to create a sensible baseline resource allocation profile for a workflow based on the task requirements and target infrastructure. Sizey continuously predicts the amount of memory each task requires and adjusts the memory allocation during runtime to minimize over-allocation while also preventing failures. Both outperform previous methods and improve the efficiency of workflow execution.

Congratulations Jonathan!

FONDA PhD student Martin Kuban successfully defends his dissertation on “Classification of materials based on similarity measures”

Martin Kuban defended his doctoral thesis on April 15, 2025. He is a member of the Theoretical Solid-State Physics group at Humboldt-Universität zu Berlin. His work focused on extracting comparable “fingerprints” for materials from heterogeneous data sources in order to identify compounds which may have similar properties. As part of this work, he developed MADAS, a python framework providing a modular and extendable interface for similarity calculations in material science.

His contributions to subproject A3 in FONDA include automating this technique as a workflow to calculate similarity between different instances of the same material in an open source repository, where its features have been calculated using different sets of parameters. This allows for the automated detection of parameters which produce reliable results, and identification of those which introduce artifacts.

His excellent work and presentation earned the grade summa cum laude – with highest honors. Congratulations Martin!

FONDA PhD student Mario Sänger successfully defends his PhD thesis on “Representation Learning for Biomedical Text Mining”

Mario Sänger, a member of the group “Human-computer interaction for Scientific Software”, successfully defended his PhD thesis on November 25, 2024. His work focuses on using representation learning to extract meaningful connections between biomedical entities, such as genes, diseases, proteins, and pharmaceuticals from a corpus of PubMed abstracts, as well as biomedical knowledge bases. In addition to demonstrating the feasibility of this corpus-wide approach, he also benchmarked and tested existing pre-trained language models (PLMs) for sentence-level relation prediction. His results show that additional context from biomedical knowledge databases does not enhance the most robust carefully tuned PLMs.

In FONDA, he collaborated with Prof. Dr. Thomas Kosch, exploring the use of ChatGPT as a tool to support users in designing and implementing scientific workflows.

Congratulations Mario, and all the best!

FONDA PHD student Sarah Kleest-Meißner successfully defends her PhD thesis on “Exploring the Complexity of Event Query Discovery”

Sarah Kleest-Meißner, research group “Logic in Computer Science” at HU Berlin’s computer science department, successfully defended her PhD thesis on September 10, 2024. She proposed an expressive, theoretical query model for sequence data based on subsequences and patterns with variables which captures the core of Complex Event Processing (CEP) languages. Based thereon, she presented an algorithm for solving the task of discovering a query that describes best a given finite set of finite sequences of events. The theoretical basis of her query model enabled a comprehensive analysis of the complexity of event query discovery, whereas a prototypical implementation and an experimental evaluation with synthetic and real-world datasets complemented the formal results.

Congratulations!