In order to focus the scope of phase I, research was restricted to the following scenario: An expert user runs one workflow to analyze locally available data at a single data center powered by unlimited energy. Phase II of FONDA lifts these assumptions, challenging researchers to develop workflows that consider sustainability, usability, and multi-site execution (SUM).
Sustainability
Many projects in phase II focus on improving sustainability in the environmental sense. For example, optimizing DAWs such that they require less energy in the first place (A2), or scheduling energy-intensive tasks to run on compute nodes powered by wind or solar (B1). Subproject B6 will monitor the end-to-end energy consumption of running DAWs. Meanwhile, B7 applies DAWs to detecting forest mortality while applying algorithms designed to reduce unnecessary re-computations on fast-changing datasets.
Other projects improve technological sustainability by looking into ways that existing workflows can be shared, reused, and maintained. This includes developing recommender systems which allow users to find existing tools and DAWs (A3), techniques to systematically assess interactions between such components (A7) and reaching out to support research communities developing and exchanging DAWs (C1).
Usability
The focus on technical sustainability is directly linked to the usability of DAWs. When workflows can be shared and reused more readily, they are more easily adopted by domain scientists. This expands the user base of DAWs beyond workflow experts.
In this phase of FONDA, a new research area (C) has been added to explore DAW design. C1 will explore how to improve (re)-usability of community generated workflows in neuroimaging research. In C2, researchers will look in to the very early stages of DAW design, where research questions are first translated into DAW specifications. The focus of C3 is on understanding how domain scientists use and interact with DAWs when they can interact with workflow tasks during runtime. Subproject A5 also focuses on usability by laying the groundwork for interactive , annotation efficient machine learning workflows for image analysis.
Multi-Site
One of the foundational assumptions of the first phase is that the data is locally available in a single location. This is not necessarily true, especially for massive datasets. Project A1 will focus on the formal foundations of validating distributed DAWs though provenance analysis. Meanwhile, project B4 will develop new models and tools to improve network throughput. Project B1 will develop scheduling techniques that work across multiple sites. B5 and B7 modify the workflows themselves, either to execute sub-workflows remotely while a large static dataset remains at a different site, or by efficiently updating DAW results to include incremental changes as new data arrives.
By addressing SUM principles, FONDA continues to increase human productivity in scientific data analysis on an even larger scale.