Continued from B6: Distributed Run-Time Monitoring and Control of Data Analysis Workflows
Description
Data centers have a large and growing contribution to global energy consumption. Factors like manufacturing, construction, disassembly, green energy emissions are frequently neglected in current estimates. The total climate footprint can vary dramatically, based on energy sources, hardware, application, utilization, life-cycle management. B6 aims at a holistic management of end-to-end energy profiles and climate footprints of ML-based data analysis workflows (DAW), as well as system-internal configuration and tuning knobs.

Scientists
- Philipp Ortner
- Ilin Tolovski
Publications
2022
Ihde, Nina; Marten, Paula; Eleliemy, Ahmed; Poerwawinata, Gabrielle; Silva, Pedro; Tolovski, Ilin; Ciorba, Florina M.; Rabl, Tilmann
A Survey of Big Data, High Performance Computing, and Machine Learning Benchmarks Proceedings Article
In: Nambiar, Raghunath; Poess, Meikel (Ed.): Performance Evaluation and Benchmarking, pp. 98–118, Springer International Publishing, Cham, 2022, ISBN: 978-3-030-94437-7.
@inproceedings{10.1007/978-3-030-94437-7_7,
title = {A Survey of Big Data, High Performance Computing, and Machine Learning Benchmarks},
author = {Nina Ihde and Paula Marten and Ahmed Eleliemy and Gabrielle Poerwawinata and Pedro Silva and Ilin Tolovski and Florina M. Ciorba and Tilmann Rabl},
editor = {Raghunath Nambiar and Meikel Poess},
isbn = {978-3-030-94437-7},
year = {2022},
date = {2022-01-01},
urldate = {2022-01-01},
booktitle = {Performance Evaluation and Benchmarking},
pages = {98--118},
publisher = {Springer International Publishing},
address = {Cham},
abstract = {In recent years, there has been a convergence of Big Data (BD), High Performance Computing (HPC), and Machine Learning (ML) systems. This convergence is due to the increasing complexity of long data analysis pipelines on separated software stacks. With the increasing complexity of data analytics pipelines comes a need to evaluate their systems, in order to make informed decisions about technology selection, sizing and scoping of hardware. While there are many benchmarks for each of these domains, there is no convergence of these efforts. As a first step, it is also necessary to understand how the individual benchmark domains relate.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Strassenburg, Nils; Tolovski, Ilin; Rabl, Tilmann
Efficiently Managing Deep Learning Models in a Distributed Environment Proceedings Article
In: 25th International Conference on Extending Database Technology (EDBT '22), 2022.
@inproceedings{strassenburg_2022_mmlib,
title = {Efficiently Managing Deep Learning Models in a Distributed Environment},
author = {Nils Strassenburg and Ilin Tolovski and Tilmann Rabl},
year = {2022},
date = {2022-01-01},
urldate = {2022-01-01},
booktitle = {25th International Conference on Extending Database Technology (EDBT '22)},
abstract = {Deep learning has revolutionized many domains relevant in research and industry, including computer vision and natural language processing by significantly outperforming previous state-of-the-art approaches. This is why deep learning models are part of many essential software applications. To guarantee their reliable and consistent performance even in changing environments, they need to be regularly adjusted, improved, and retrained but also documented, deployed, and monitored. An essential part of this set of processes, referred to as model management, is to save and recover models. To enable debugging, many applications require an exact model representation. In this paper, we investigate if, and to what extend, we can outperform a baseline approach capable of saving and recovering models, while focusing on storage consumption, time-to-save, and time-to-recover. We present our Python library MMlib, offering three approaches: a baseline approach that saves complete model snapshots, a parameter update approach that saves the updated model data, and a model provenance approach that saves the model’s provenance instead of the model itself. We evaluate all approaches in four distributed environments on different model architectures, model relations, and data sets. Our evaluation shows that both the model provenance and parameter update approach outperform the baseline by up to 15.8% and 51.7% in time-to-save and by up to 70.0% and 95.6% in storage consumption, respectively.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Benson, Lawrence; Rabl, Tilmann
Darwin: Scale-In Stream Processing Proceedings Article
In: 12th Annual Conference on Innovative Data Systems Research (CIDR ’22), 2022.
@inproceedings{benson_darwin_2022,
title = {Darwin: Scale-In Stream Processing},
author = {Lawrence Benson and Tilmann Rabl},
year = {2022},
date = {2022-01-01},
urldate = {2022-01-01},
booktitle = {12th Annual Conference on Innovative Data Systems Research (CIDR ’22)},
abstract = {Companies increasingly rely on stream processing engines (SPEs) to quickly analyze data and monitor infrastructure. These systems enable continuous querying of data at high rates. Current production-level systems, such as Apache Flink and Spark, rely on clusters of servers to scale out processing capacity. Yet, these scale-out systems are resource inefficient and cannot fully utilize the hardware. As a solution, hardware-optimized, single-server, scale-up SPEs were developed. To get the best performance, they neglect essential features for industry adoption, such as larger-than-memory state and recovery. This requires users to choose between high performance or system availability. While some streaming workloads can afford to lose or reprocess large amounts of data, others cannot, forcing them to accept lower performance. Users also face a large performance drop once their workloads slightly exceed a single server and force them to use scale-out SPEs. To acknowledge that real-world stream processing setups have drastically varying performance and availability requirements, we propose scale-in processing. Scale-in processing is a new paradigm that adapts to various application demands by achieving high hardware utilization on a wide range of single- and multi-node hardware setups, reducing overall infrastructure requirements. In contrast to scaling-up or -out, it focuses on fully utilizing the given hardware instead of demanding more or ever-larger servers. We present Darwin, our scale-in SPE prototype that tailors its execution towards arbitrary target environments through compiling stream processing queries while recoverable larger-than-memory state management. Early results show that Darwin achieves an order of magnitude speed-up over current scale-out systems and matches processing rates of scale-up systems.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Monte, Bonaventura Del; Zeuch, Steffen; Rabl, Tilmann; Markl, Volker
Rethinking Stateful Stream Processing with RDMA Proceedings Article
In: ACM SIGMOD International Conference on Management of Data (SIGMOD ’22), 2022.
@inproceedings{delmonte2022rethinking,
title = {Rethinking Stateful Stream Processing with RDMA},
author = {Bonaventura Del Monte and Steffen Zeuch and Tilmann Rabl and Volker Markl},
year = {2022},
date = {2022-01-01},
urldate = {2022-01-01},
booktitle = {ACM SIGMOD International Conference on Management of Data (SIGMOD ’22)},
abstract = {Remote Direct Memory Access (RDMA) hardware has bridged the gap between network and main memory speed and thus invalidated the common assumption that network is often the bottleneck in distributed data processing systems. However, high-speed networks do not provide "plug-and-play" performance (e.g., using IP-overInfiniBand) and require a careful co-design of system and application logic. As a result, system designers need to rethink the architecture of their data management systems to benefit from RDMA acceleration. In this paper, we focus on the acceleration of stream processing engines, which is challenged by real-time constraints and state consistency guarantees. To this end, we propose Slash, a novel stream processing engine that uses high-speed networks and RDMA to efficiently execute distributed streaming computations. Slash embraces a processing model suited for RDMA acceleration and scales out by omitting the expensive data re-partitioning demands of scale-out SPEs. While scale-out SPEs rely on data re-partitioning to execute a query over many nodes, Slash uses RDMA to share mutable state among nodes. Overall, Slash achieves a throughput improvement up to two orders of magnitude over existing systems deployed on an InfiniBand network. Furthermore, it is up to a factor of 22 faster than a self-developed solution that relies on RDMA-based data repartitioning to scale out query processing.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Maltenberger, Tobias; Ilic, Ivan; Tolovski, Ilin; Rabl, Tilmann
Evaluating Multi-GPU Sorting with Modern Interconnects Proceedings Article
In: 2022 ACM SIGMOD International Conference on Management of Data (SIGMOD ’22), 2022.
@inproceedings{maltenberger2022evaluating,
title = {Evaluating Multi-GPU Sorting with Modern Interconnects},
author = {Tobias Maltenberger and Ivan Ilic and Ilin Tolovski and Tilmann Rabl},
year = {2022},
date = {2022-01-01},
urldate = {2022-01-01},
booktitle = {2022 ACM SIGMOD International Conference on Management of Data (SIGMOD ’22)},
abstract = {In recent years, GPUs have become a mainstream accelerator for database operations such as sorting. Most of the published GPU- based sorting algorithms are single-GPU approaches. Consequently, they neither harness the full computational power nor exploit the high-bandwidth P2P interconnects of modern multi-GPU platforms. In particular, the latest NVLink 2.0 and NVLink 3.0-based NVSwitch interconnects promise unparalleled multi-GPU acceleration. Re- garding multi-GPU sorting, there are two types of algorithms: GPU- only approaches, utilizing P2P interconnects, and heterogeneous strategies that employ the CPU and the GPUs. So far, both types have been evaluated at a time when PCIe 3.0 was state-of-the-art. In this paper, we conduct an extensive analysis of serial, parallel, and bidirectional data transfer rates to, from, and between multiple GPUs on systems with PCIe 3.0, PCIe 4.0, NVLink 2.0, and NVLink 3.0-based NVSwitch interconnects. We measure up to 35.3× higher parallel P2P copy throughput with NVLink 3.0-powered NVSwitch over PCIe 3.0 interconnects. To study multi-GPU sorting on today’s hardware, we implement a P2P-based (P2P sort) and a heteroge- neous (HET sort) multi-GPU sorting algorithm and evaluate them on three modern systems. We observe speedups over state-of-the- art parallel CPU-based radix sort of up to 14× for P2P sort and 9× for HET sort. On systems with high-speed P2P interconnects, we demonstrate that P2P sort outperforms HET sort by up to 1.65×. Finally, we show that overlapping GPU copy and compute opera- tions to mitigate the transfer bottleneck does not yield performance improvements on modern multi-GPU platforms.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2021
Traub, Jonas; Grulich, Philipp Marian; Cuéllar, Alejandro Rodríguez; Breß, Sebastian; Katsifodimos, Asterios; Rabl, Tilmann; Markl, Volker
Scotty: General and Efficient Open-Source Window Aggregation for Stream Processing Systems Journal Article
In: Transactions on Database Systems, vol. 46, no. 1, pp. 46, 2021.
@article{noauthororeditorb,
title = {Scotty: General and Efficient Open-Source Window Aggregation for Stream Processing Systems},
author = {Jonas Traub and Philipp Marian Grulich and Alejandro Rodríguez Cuéllar and Sebastian Breß and Asterios Katsifodimos and Tilmann Rabl and Volker Markl},
year = {2021},
date = {2021-03-01},
urldate = {2021-03-01},
journal = {Transactions on Database Systems},
volume = {46},
number = {1},
pages = {46},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Gévay, Gábor E.; Rabl, Tilmann; Breß, Sebastian; Madai-Tahy, Loránd; Quiané-Ruiz, Jorge-Arnulfo; Markl, Volker
Efficient Control Flow in Dataflow Systems: When Ease-of-Use Meets High Performance Proceedings Article
In: 37th IEEE International Conference on Data Engineering, 2021.
@inproceedings{noauthororeditor,
title = {Efficient Control Flow in Dataflow Systems: When Ease-of-Use Meets High Performance},
author = {Gábor E. Gévay and Tilmann Rabl and Sebastian Breß and Loránd Madai-Tahy and Jorge-Arnulfo Quiané-Ruiz and Volker Markl},
year = {2021},
date = {2021-01-01},
urldate = {2021-01-01},
booktitle = {37th IEEE International Conference on Data Engineering},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Benson, Lawrence; Makait, Hendrik; Rabl, Tilmann
Viper: An Efficient Hybrid PMem-DRAM Key-Value Store Proceedings Article
In: Proceedings of the VLDB Endowment, pp. 1544-1556, 2021.
@inproceedings{benson_viper_2021,
title = {Viper: An Efficient Hybrid PMem-DRAM Key-Value Store},
author = {Lawrence Benson and Hendrik Makait and Tilmann Rabl},
year = {2021},
date = {2021-01-01},
urldate = {2021-01-01},
booktitle = {Proceedings of the VLDB Endowment},
volume = {14},
number = {9},
pages = {1544-1556},
abstract = {Key-value stores (KVSs) have found wide application in modern software systems. For persistence, their data resides in slow secondary storage, which requires KVSs to employ various techniques to increase their read and write performance from and to the underlying medium. Emerging persistent memory (PMem) technologies offer data persistence at close-to-DRAM speed, making them a promising alternative to classical disk-based storage. However, simply drop-in replacing existing storage with PMem does not yield good results, as block-based access behaves differently in PMem than on disk and ignores PMem's byte addressability, layout, and unique performance characteristics. In this paper, we propose three PMem-specific access patterns and implement them in a hybrid PMem-DRAM KVS called Viper. We employ a DRAM-based hash index and a PMem-aware storage layout to utilize the random-write speed of DRAM and efficient sequential-write performance PMem. Our evaluation shows that Viper significantly outperforms existing KVSs for core KVS operations while providing full data persistence. Moreover, Viper outperforms existing PMem-only, hybrid, and disk-based KVSs by 4–18x for write workloads, while matching or surpassing their get performance.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
