Edward Hiley is a principal systems engineer with NHS Digital, where he has worked on national services such as SUS+, a ground-up full replacement of the current Secondary Uses Service (SUS), which involves myriad challenges, including immutable infrastructure, disputed compute clusters, and multi-data-center deployments. Previously, Edward was a solution architect for the Health and Social Care Information Centre and an associate director for the National Institute for Health and Clinical excellence.
Principal Systems Engineer
You had one job! Learning to cope with failures in a complex distributed system
Stuff breaks. It’s one of the basic fundamentals of IT, but there are some things you expect to just work. But what happens when these things decide to let you down, especially when part of a large distributed system?
Ed Hiley and Dan Rathbone offer an overview of the technical renaissance going on in parts of the NHS, where things are being done in a modern way. Ed and Dan explore a recently launched data processing system that utilizes Apache Spark, Riak, and Python and discuss the events they encountered along the way where things they took for granted just stopped doing the things they expected. Ed and Dan dive into some of these events to debunk the assumptions they made and explain how they troubleshot and fixed the issues.