Finding the Needle in the Haystack - or - Troubleshooting Distributed Systems

Anthony Molinaro
Developer Architect

Many tools are available for building distributed systems. A few in common use are cross language RPC (thrift), programming languages (erlang), and libraries (riak_core). With these it becomes possible to build service oriented architectures which scale horizontally at each layer, and which are loosely coupled between layers. System crashes and failures of components are often easy to catch, but application layer logic issues or data consistency issues can be hard to catch as each interaction with a the system can hit any subset of the machines.   In order to address this problem at OpenX, an open source system called Mondemand was modified and a methodology for cross language tracing of system interactions was implemented. This allowed customer support, quality assurance, and engineering to gather information for better determination of issues and easier troubleshooting.   This talk will discuss the challenges of troubleshooting distributed systems, the setup of the Mondemand tracing system, the instrumentation of different services for tracing, and using traces to track down issues in your distributed systems.

Talk objectives:

This talk attempts to show how adding the ability to follow requests through a large distributed, service oriented architecture can make finding and solving issues much easier.

Target audience:

Systems architects, developers, operations people, quality assurance engineers.

Slides
Video

Anthony Molinaro has been developing large-scale distributed systems since the late 90s in many languages and environments. First at Goto.com, a pioneering company in search advertising, where he helped to develop many of the core serving pieces in Java, C and Perl. After Goto.com changed its name to Overture.com and was acquired by Yahoo!, Anthony spent 5 years working on the content match advertising system written in C. Upon leaving Yahoo! in 2008 he joined a small startup that used Erlang exclusively and extensively. Later in 2009, Anthony was hired by OpenX where he has since introduced Erlang and spearheaded its use across a large portion of OpenX’s Global Digital Revenue Platform.

Twitter: @djnym

Github: djnym

 


Back to conference page