Continued from B5: Adaptive, Distributed and Scalable Analysis of Massive Satellite Data
Description
DAWs for satellite data analysis are highly heterogeneous and complex in terms of input data, software, and resource requirements. Furthermore, data for DAWs are often available at different data centers, with data download being a major bottleneck in DAWs execution.
Our goal is to expand orchestration of Earth Observation Workflows (EOWs) from FONDA I to a federated multi-center scenario to enable analysis of changes in agricultural land use over very large, heterogeneous and distributed data sets.
Scientists
- Felix Kummer
- Katarzyna Ewa Lewińska
Publications
2025
Lehmann, Fabian; Bader, Jonathan; Tschirpke, Friedrich; Mecquenem, Ninon De; Lößer, Ansgar; Becker, Sören; Lewińska, Katarzyna Ewa; Thamsen, Lauritz; Leser, Ulf
WOW: Workflow-Aware Data Movement and Task Scheduling for Dynamic Scientific Workflows Proceedings Article
In: 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Tromsø, Norway, 2025.
@inproceedings{lehmannWOW2025,
title = {WOW: Workflow-Aware Data Movement and Task Scheduling for Dynamic Scientific Workflows},
author = { Fabian Lehmann and Jonathan Bader and Friedrich Tschirpke and Ninon De Mecquenem and Ansgar Lößer and Sören Becker and Katarzyna Ewa Lewińska and Lauritz Thamsen and Ulf Leser},
year = {2025},
date = {2025-05-01},
urldate = {2025-05-01},
booktitle = {2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)},
address = {Tromsø, Norway},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
West, Kathleen; Lehmann, Fabian; Bountris, Vasilis; Leser, Ulf; Elkhatib, Yehia; Thamsen, Lauritz
Exploring the Potential of Carbon-Aware Execution for Scientific Workflows Proceedings Article
In: 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Tromsø, Norway, 2025.
@inproceedings{lehmannWOW2025b,
title = {Exploring the Potential of Carbon-Aware Execution for Scientific Workflows},
author = { Kathleen West and Fabian Lehmann and Vasilis Bountris and Ulf Leser and Yehia Elkhatib and Lauritz Thamsen},
year = {2025},
date = {2025-05-01},
urldate = {2025-05-01},
booktitle = {2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)},
address = {Tromsø, Norway},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Lewińska, Katarzyna Ewa; Okujeni, Akpona; Kowalski, Katja; Lehmann, Fabian; Radeloff, Volker C.; Leser, Ulf; Hostert, Patrick
Impact of data density and endmember definitions on long-term trends in ground cover fractions across European grasslands Journal Article
In: Remote Sensing of Environment, vol. 323, pp. 114736, 2025, ISSN: 0034-4257.
@article{LEWINSKA2025114736,
title = {Impact of data density and endmember definitions on long-term trends in ground cover fractions across European grasslands},
author = { Katarzyna Ewa Lewińska and Akpona Okujeni and Katja Kowalski and Fabian Lehmann and Volker C. Radeloff and Ulf Leser and Patrick Hostert},
url = {https://www.sciencedirect.com/science/article/pii/S0034425725001403},
doi = {https://doi.org/10.1016/j.rse.2025.114736},
issn = {0034-4257},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-01},
journal = {Remote Sensing of Environment},
volume = {323},
pages = {114736},
abstract = {Long-term monitoring of grasslands is pivotal for ensuring continuity of many environmental services and for supporting food security and environmental modeling. Remote sensing provides an irreplaceable source of information for studying changes in grasslands. Specifically, Spectral Mixture Analysis (SMA) allows for quantification of physically meaningful ground cover fractions of grassland ecosystems (i.e., green vegetation, non-photosynthetic vegetation, and soil), which is crucial for our understanding of change processes and their drivers. However, although popular due to straightforward implementation and low computational cost, ‘classical’ SMA relies on a single endmember definition for each targeted ground cover component, thus offering limited suitability and generalization capability for heterogeneous landscapes. Furthermore, the impact of irregular data density on SMA-based long-term trends in grassland ground cover has also not yet been critically addressed. We conducted a systematic assessment of i) the impact of data density on long-term trends in ground cover fractions in grasslands; and ii) the effect of endmember definition used in ‘classical’ SMA on pixel- and map-level trends of grassland ground cover fractions. We performed our study for 13 sites across European grasslands and derived the trends based on the Cumulative Endmember Fractions calculated from monthly composites. We compared three different data density scenarios, i.e., 1984–2021 Landsat data record as is, 1984–2021 Landsat data record with the monthly probability of data after 2014 adjusted to the pre-2014 levels, and the combined 1984–2021 Landsat and 2015–2021 Sentinel-2 datasets. For each site we ran SMA using a selection of site-specific and generalized endmembers, and compared the pixel- and map-level trends. Our results indicated no significant impact of varying data density on the long-term trends from Cumulative Endmember Fractions in European grasslands. Conversely, the use of different endmember definitions led in some regions to significantly different pixel- and map-level long-term trends raising questions about the suitability of the ‘classical’ SMA for complex landscapes and large territories. Therefore, we caution against using the ‘classical’ SMA for remote-sensing-based applications across broader scales or in heterogenous landscapes, particularly for trend analyses, as the results may lead to erroneous conclusions.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2024
Bader, Jonathan; Skalski, Fabian; Lehmann, Fabian; Scheinert, Dominik; Will, Jonathan; Thamsen, Lauritz; Kao, Odej
Sizey: Memory-Efficient Execution of Scientific Workflow Tasks Proceedings Article
In: 2024 IEEE International Conference on Cluster Computing (CLUSTER), 2024.
@inproceedings{bader2024Sizey,
title = {Sizey: Memory-Efficient Execution of Scientific Workflow Tasks},
author = {Jonathan Bader and Fabian Skalski and Fabian Lehmann and Dominik Scheinert and Jonathan Will and Lauritz Thamsen and Odej Kao},
url = {https://arxiv.org/pdf/2407.16353},
year = {2024},
date = {2024-09-21},
urldate = {2024-09-21},
booktitle = {2024 IEEE International Conference on Cluster Computing (CLUSTER)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Bader, Jonathan; Lehmann, Fabian; Thamsen, Lauritz; Leser, Ulf; Kao, Odej
Lotaru: Locally predicting workflow task runtimes for resource management on heterogeneous infrastructures Journal Article
In: Future Generation Computer Systems, vol. 150, pp. 171-185, 2024, ISSN: 0167-739X.
@article{BADER2023,
title = {Lotaru: Locally predicting workflow task runtimes for resource management on heterogeneous infrastructures},
author = {Jonathan Bader and Fabian Lehmann and Lauritz Thamsen and Ulf Leser and Odej Kao},
url = {https://www.sciencedirect.com/science/article/pii/S0167739X23003229},
doi = {https://doi.org/10.1016/j.future.2023.08.022},
issn = {0167-739X},
year = {2024},
date = {2024-01-01},
urldate = {2023-01-01},
journal = {Future Generation Computer Systems},
volume = {150},
pages = {171-185},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Lewińska, Katarzyna Ewa; Frantz, David; Leser, Ulf; Hostert, Patrick
Usable observations over Europe: evaluation of compositing windows for Landsat and Sentinel-2 time series Journal Article
In: European Journal of Remote Sensing, vol. 57, no. 1, pp. 2372855, 2024.
@article{doi:10.1080/22797254.2024.2372855,
title = {Usable observations over Europe: evaluation of compositing windows for Landsat and Sentinel-2 time series},
author = {Katarzyna Ewa Lewińska and David Frantz and Ulf Leser and Patrick Hostert},
url = {https://doi.org/10.1080/22797254.2024.2372855},
doi = {10.1080/22797254.2024.2372855},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
journal = {European Journal of Remote Sensing},
volume = {57},
number = {1},
pages = {2372855},
publisher = {Taylor & Francis},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Lehmann, Fabian; Bader, Jonathan; Mecquenem, Ninon De; Wang, Xing; Bountris, Vasilis; Friederici, Florian; Leser, Ulf; Thamsen, Lauritz
Ponder: Online Prediction of Task Memory Requirements for Scientific Workflows Proceedings Article
In: 2024 IEEE 20th International Conference on e-Science (e-Science), pp. 1-10, 2024.
@inproceedings{lehmannPonder2024,
title = {Ponder: Online Prediction of Task Memory Requirements for Scientific Workflows},
author = { Fabian Lehmann and Jonathan Bader and Ninon De Mecquenem and Xing Wang and Vasilis Bountris and Florian Friederici and Ulf Leser and Lauritz Thamsen},
url = {https://ieeexplore.ieee.org/document/10678682},
doi = {10.1109/e-Science62913.2024.10678682},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
booktitle = {2024 IEEE 20th International Conference on e-Science (e-Science)},
pages = {1-10},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Sänger, Mario; Mecquenem, Ninon De; Lewińska, Katarzyna Ewa; Bountris, Vasilis; Lehmann, Fabian; Leser, Ulf; Kosch, Thomas
A Qualitative Assessment of Using ChatGPT as Large Language Model for Scientific Workflow Development Journal Article
In: GigaScience, 2024, ISSN: 2047-217X.
@article{saenger2024a,
title = {A Qualitative Assessment of Using ChatGPT as Large Language Model for Scientific Workflow Development},
author = { Mario Sänger and Ninon De Mecquenem and Katarzyna Ewa Lewińska and Vasilis Bountris and Fabian Lehmann and Ulf Leser and Thomas Kosch},
url = {https://doi.org/10.1093/gigascience/giae030},
doi = {10.1093/gigascience/giae030},
issn = {2047-217X},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
journal = {GigaScience},
abstract = {Scientific workflow systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets, as they offer reproducibility, dependability, and scalability of analyses by automatic parallelization on large compute clusters. However, implementing workflows is difficult due to the involvement of many black-box tools and the deep infrastructure stack necessary for their execution. Simultaneously, user-supporting tools are rare, and the number of available examples is much lower than in classical programming languages.To address these challenges, we investigate the efficiency of large language models (LLMs), specifically ChatGPT, to support users when dealing with scientific workflows. We performed 3 user studies in 2 scientific domains to evaluate ChatGPT for comprehending, adapting, and extending workflows. Our results indicate that LLMs efficiently interpret workflows but achieve lower performance for exchanging components or purposeful workflow extensions. We characterize their limitations in these challenging scenarios and suggest future research directions.Our results show a high accuracy for comprehending and explaining scientific workflows while achieving a reduced performance for modifying and extending workflow descriptions. These findings clearly illustrate the need for further research in this area.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Schintke, Florian; Belhajjame, Khalid; Mecquenem, Ninon De; Frantz, David; Guarino, Vanessa Emanuela; Hilbrich, Marcus; Lehmann, Fabian; Missier, Paolo; Sattler, Rebecca; Sparka, Jan Arne; Speckhard, Daniel T.; Stolte, Hermann; Vu, Anh Duc; Leser, Ulf
Validity constraints for data analysis workflows Journal Article
In: Future Generation Computer Systems, vol. 157, pp. 82–97, 2024, ISSN: 0167-739X.
@article{SCHINTKE2024,
title = {Validity constraints for data analysis workflows},
author = {Florian Schintke and Khalid Belhajjame and Ninon De Mecquenem and David Frantz and Vanessa Emanuela Guarino and Marcus Hilbrich and Fabian Lehmann and Paolo Missier and Rebecca Sattler and Jan Arne Sparka and Daniel T. Speckhard and Hermann Stolte and Anh Duc Vu and Ulf Leser},
url = {https://www.sciencedirect.com/science/article/pii/S0167739X24001079},
doi = {https://doi.org/10.1016/j.future.2024.03.037},
issn = {0167-739X},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
journal = {Future Generation Computer Systems},
volume = {157},
pages = {82--97},
abstract = {Porting a scientific data analysis workflow (DAW) to a cluster infrastructure, a new software stack, or even only a new dataset with some notably different properties is often challenging. Despite the structured definition of the steps (tasks) and their interdependencies during a complex data analysis in the DAW specification, relevant assumptions may remain unspecified and implicit. Such hidden assumptions often lead to crashing tasks without a reasonable error message, poor performance in general, non-terminating executions, or silent wrong results of the DAW, to name only a few possible consequences. Searching for the causes of such errors and drawbacks in a distributed compute cluster managed by a complex infrastructure stack, where DAWs for large datasets typically are executed, can be tedious and time-consuming. We propose validity constraints (VCs) as a new concept for DAW languages to alleviate this situation. A VC is a constraint specifying logical conditions that must be fulfilled at certain times for DAW executions to be valid. When defined together with a DAW, VCs help to improve the portability, adaptability, and reusability of DAWs by making implicit assumptions explicit. Once specified, VCs can be controlled automatically by the DAW infrastructure, and violations can lead to meaningful error messages and graceful behaviour (e.g., termination or invocation of repair mechanisms). We provide a broad list of possible VCs, classify them along multiple dimensions, and compare them to similar concepts one can find in related fields. We also provide a proof-of-concept implementation for the workflow system Nextflow.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2023
Bader, Jonathan; Belak, Jim; Bement, Matthew; Berry, Matthew; Carson, Robert; Cassol, Daniela; Chan, Stephen; Coleman, John; Day, Kastan; Duque, Alejandro; others,
Novel Approaches Toward Scalable Composable Workflows in Hyper-Heterogeneous Computing Environments Proceedings Article
In: Proceedings of the SC'23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, pp. 2097–2108, 2023.
@inproceedings{bader2023novel,
title = {Novel Approaches Toward Scalable Composable Workflows in Hyper-Heterogeneous Computing Environments},
author = {Jonathan Bader and Jim Belak and Matthew Bement and Matthew Berry and Robert Carson and Daniela Cassol and Stephen Chan and John Coleman and Kastan Day and Alejandro Duque and others},
url = {https://dl.acm.org/doi/pdf/10.1145/3624062.3626283},
year = {2023},
date = {2023-11-18},
urldate = {2023-01-01},
booktitle = {Proceedings of the SC'23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis},
pages = {2097–2108},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}