Dan Rathbone is cofounder and technical director of Infinity Works, a 200-strong consultancy and software house based out of Leeds and London, where he builds and operates high-scale and high-performance systems for Infinity Works’ clients. Most recently, Dan has been working with NHS Digital to drive the modernization of critical national services, reengineering them using FOSS, end-to-end DevOps teams, and Agile and Lean delivery techniques. Over his career, Dan has held a number of varied roles focusing on areas from infrastructure to frontend development and most things in between.
You had one job! Learning to cope with failures in a complex distributed system
Stuff breaks. It’s one of the basic fundamentals of IT, but there are some things you expect to just work. But what happens when these things decide to let you down, especially when part of a large distributed system?
Ed Hiley and Dan Rathbone offer an overview of the technical renaissance going on in parts of the NHS, where things are being done in a modern way. Ed and Dan explore a recently launched data processing system that utilizes Apache Spark, Riak, and Python and discuss the events they encountered along the way where things they took for granted just stopped doing the things they expected. Ed and Dan dive into some of these events to debunk the assumptions they made and explain how they troubleshot and fixed the issues.