Surviving Fire: using Erlang as an OS to Achieve Massive Fault Tolerance

Sam Williams
Bringing Erlang to the Bare Metal

This talk provides an overview of a new project that uses one BEAM instance per core, running directly on the metal, as the base of an operating system. The OS has been built from the ground up to provide software and hardware fault tolerance. Due to the operating system's structure, it can withstand the failure of Erlang processes, entire Erlang nodes and failure of various hardware elements (like CPU cores, RAM modules and hard drives), without incurring total loss of operation. The talk will provide a brief background of the related operating system concepts, such that it is accessible to those not familiar with operating system design.

As well as describing the project and progress that has been made so far, this talk will demonstrate how we are testing the hardware fault tolerance of the operating system. These hardware fault tolerance tests come in the form of interrupting and damaging computers in various ways during operation, in order to catalogue how the operating system reacts.

Talk objectives:

To explain how Erlang can be used, by deploying one BEAM instance per core, running on the metal, to build fault tolerance to hardware, as well as software failure.

Target audience:

Anyone that is interested in fault tolerance, operating systems or fire!

Slides
Video

Sam is a PhD student building a scalable and fault tolerant operating system using Erlang. This builds on his undergraduate dissertation project, in which he created a 'BEAM on bare-metal' OS targeting the Xen hypervisor. This project differed from ErlangOnXen in that it was a direct port of the BEAM, rather than using the Ling VM. Aside from Sam's OS work, he has been building web applications with Erlang, YAWS, and a web framework for around 6 years. When Sam is not programming he enjoys walking and climbing!


GitHub: samcamwilliams

Back to conference page