Scalable Multi-Language Data Analysis on Beam: The Cuneiform Experience

Jörgen Brandt
Scalable Workflow Languages Expert

The need to analyze large scientific data sets on the one hand and the availability of distributed compute resources with an increasing number of CPU cores, on the other hand, have promoted the development of a variety of systems for distributed data analysis. Erlang, a language focused on concurrency and asynchronous communication, is a perfect match for orchestrating concurrent, distributed computation. In this talk we discuss the building blocks constituting the distributed execution environment underlying the Cuneiform workflow language. We show how Erlang actors and behaviours can be composed to build a system for concurrently running workloads comprising a large number of independent tasks and accessing large amounts of data through distributed file systems.

Cuneiform is a minimal workflow language focused on parallelism and integration. Users create workflows by defining and calling deterministic, side effect-free tasks. The Cuneiform interpreter automatically derives task- and data parallelism in a workflow. Tasks can be defined in any given programming language. Thus, external tools and libraries can be integrated with minimal effort. We demonstrate how workflows integrating external libraries from the bioinformatics domain are specified in Cuneiform and how parallel execution takes place on the Erlang VM orchestrating distributed compute resources and file systems.

Talk objectives

The talk's objective is to make the case for the Erlang VM as a platform for large-scale workflow orchestration.

Target audience

People interested in workflow languages and distributed data analysis.

Slides
Video

Jörgen Brandt is a PhD student at the Humboldt-Universität in Berlin. His research interests include next generation sequencing, scientific workflows and functional programming languages. He graduated in Computer Science with a specialization on intelligent systems at the Technische Universität Berlin in 2011 and in Information Technology and Networked Systems at the Hochschule für Technik und Wirtschaft in 2008.


GitHub: joergen7

Twitter: @joergenbr

Back to conference page