ETC

 

Ericsson

Subscribe to our Erlang Factory newsletter to receive the latest updates and news

Jared Flatow
Nokia Research Center

Speaker
Jared Flatow is an engineer at the Nokia Research Center in Palo Alto. Prior to joining the NRC in 2009, Jared was tackling large-data problems in the field of bioinformatics at the Northwestern University Biomedical Informatics Center in Chicago. Jared was among the first to apply Map/Reduce to problems in bioinformatics, presenting his work using Hadoop at the Next-Generation Sequencing Data Analysis 
conference in 2008. Jared long-dreamed of rewriting a Map/Reduce framework in Python, but it was not until discovering Disco that he realized the elegance of combining the strengths of both Erlang and  Python towards that end. Since then, Jared has become a contributor to Disco, and co-architect of the simple and scalable Discodex data- indexing pipeline. His recent efforts have been aimed at using Discodex to achieve a full-stack massive-scale data visualization pipeline with his colleagues at the Nokia Research Center.

Jared Flatow is Giving the Following Talks
Discodex: intuitive data indexing

Disco combines the strengths of Erlang and Python to enable rapid development of massively parallel computational pipelines. Disco implements the MapReduce framework, making it a powerful platform for doing distributed computing on immense datasets.

The first step to building a system driven by data, is indexing the data in such a way that it is accessible in logarithmic or constant time. Such random access is crucial for building online systems, but also valuable in optimizing  many other applications which rely upon lookups into the data.

`Discodex` builds on top of Disco,abstracting away some of the most common operations for organizing piles of raw data into distributed, append-only indices and querying them. By adopting erlang-style immutability of data structures, itis possible to index and query billions of data items efficiently. Discodex adopts a similar strategy to Disco in achieving this goal: making the interface so embarrassingly simple and intuitive, that development time is never an excuse for not building an index.

In this talk we discuss the architecture of this awesome, open-source tool (with Erlang at its heart), and how to use it. We also provide a real-world example of using Discodex for data insight at Nokia, and the reason we built it in the first place.