Live Distributed Spying

Brian Mitchell
Distributed Systems Engineer

During the development of Cloudant's core database distribution layer, it becomes difficult to test or even observe expected conditions across multiple Erlang nodes simultaneously. Spyglass is an experimental tool developed to help capture and validate properties of specific traces through portions of a distributed system. It aims to allow declarative observation of a live distributed system which range from integration test clusters to production clusters.

Talk objectives:

This talk will explain a few of the shortcomings of testing large scale distributed systems using traditional testing tools. I will then describe how spyglass helps solve some of these problems during the development cycle as well as in operations of a production service. I'll also describe how it fits in with the rest of our testing and operational infrastructure.

Target audience:

This will be interesting to people using or building distributed systems. It will aim to provide accessibility for an intermediate Erlang developer, though some of the talk should prove interesting to those with more advanced backgrounds as well. The tool should be open sourced well before this talk is given so I'm going to avoid making it a complete tutorial but make a point to inspire people with new ways to work with their Erlang clusters.


Video

Brian Mitchell works in the database engineering team at Cloudant. This involves complex distributed Erlang systems that require high performance and excellent fault tolerance. Accomplishing this has required exploration of new types of tools to assist in development and operation operation of the database as a service to Cloudant's customers.


Back to conference page