Tigres: Template Interfaces for Agile Parallel Data-Intensive Science
Our approach is inspired by the success of the MapReduce model. When the MapReduce model emerged from the Internet search space, it provided a radically simpler paradigm for parallel analysis by certain classes of applications. The simplicity of the API and analysis model enabled many applications to quickly script powerful scalable analyses. We propose to follow the example of MapReduce and provide abstractions that support a wide-array of common scientific application computational patterns.
We will focus on four research topics in this proposed work. First, we will design and implement the template abstraction to capture the core set of fundamental workflow patterns, which will allow users to compose collaborative workflow scripts in a programming language of their choice. Second, we will design and develop a hybrid execution mechanism for the templates that enables users to prototype their analysis workflows on desktops and seamlessly adapt them to run in production environments at scale. Third, we will provide programmatic interfaces that will allow automated and user-provided provenance tracking. Finally, we will provide interfaces to capture execution state and allow users to understand complex parallel faults encountered during execution of the workflow.
Examples of DOE science areas where this capability will make a significant difference include light-source science, cosmology, combustion, atmospheric and soil carbon, next generation ecosystem experiments, materials simulation, and gene sequencing. We will engage with our collaborators in these domains in a user-centered design process to define the templates, execution, provenance, and fault-tolerance interfaces. The resulting capability is expected to dramatically expand the access to collaborative scientific data analysis.
The key outcome of this project will be the Tigres template library that will allow users to quickly develop and test their analysis workflows on laptops and desktops and then subsequently move them into production HPC resources and large data volumes. Tigres is expected to significantly impact scientist productivity in collaborations by reducing complexity of composition, improving execution efficiency, and sharing of analysis templates in collaborations.