Comment | Resilience Lessons From the Fastly Outage

Charlotte Binstead

11 June 2021, 04.00am

The Fastly outage brought parts of the internet to a standstill. Charlotte Binstead, Marketing Manager, Cloudsoft discusses the steps we can take to ‘plan for failure’ in the future.

Last week’s ‘#internetshutdown’, caused by an outage at content delivery network Fastly, demonstrated the importance of planning for failure, thinking about application reliability from a top-down perspective and setting a resilience strategy to combat fragile yet complex IT estates.

Thousands of sites were affected by the Fastly outage, including Amazon, Netflix, the BBC, the Guardian and Spotify. Whilst not being able to access your favourite show, news source or album for just shy of an hour might have been a mild inconvenience, more worrying was that the UK Government site gov.uk was also out of action.

Ultimately, the issue was resolved within 45 minutes, demonstrating the importance of observation and a fast recovery time objective (RTO).

But if that fix hadn’t been identified so quickly, it could have caused significant issues for a lot more people – the vast majority of whom now rely on being able to access Government services online and on-demand.

It is also interesting to note this event happened in the same week that Ofcom revealed that the pandemic drove us to spend more time than ever online.

New dependencies, new vulnerabilities

The Fastly network outage revealed the new dependencies and vulnerabilities that are emerging from the complexity of modern technology landscapes. Yet, while individual organisations have more complex tech stacks than ever, the vendor landscape is becoming more homogenous – meaning outages have the potential to impact end-users even more.

When time is money, limiting the damage (reputationally or technologically) that such an outage might have, however irregular it might be, is vital. Those organisations impacted by last week’s events are likely now weighing up any fallout and how they can limit the impact of such an event happening again – particularly if they are part of a regulated industry.

Interoperability and regulating resilience

The drive towards regulating resilience in such industries as the financial services sector seeks to precisely prevent this issue. If several large banks are using the same third-party provider of a service, and that provider fails, then what?

Fortunately, in the Fastly case, a fix was made within an hour – but should the opposite have happened, and it had taken days to resolve, it could have had a serious economic impact across the global financial markets causing regulators to question why backup options were not in place to protect the organisation and customers for such an eventuality.

We are already starting to see the regulation of resilience in the industry. The Financial Conduct Authority (FCA) recently published its final guidance on operational resilience in the Financial Services sector which comes into force in March next year and aligns with the EU’s Digital Operational Resilience Act (DORA).

Across both pieces of guidance, commonalities exist, namely in identifying any vulnerabilities in their operational resilience, firms are expected to have:

Identified their important business services
Set impact tolerances for the maximum tolerable disruption, and
Carried out mapping and testing to a level of sophistication necessary to do so

Towards continuous resilience

These steps will allow organisations to think more holistically when it comes to considering the resilience of their systems. Certainly, we find that more complex infrastructures breed fragility, and so, for systems to be resilient they, by definition, need to become more elastic.

One way to achieve this elasticity is through orchestration. This approach cuts through the complexity of the landscape instead of adding to it.

Gartner calls this category of tooling the ‘Digital Platform Conductor’ – a new breed of tool that provides technology leaders with visibility of the hybrid digital infrastructure they have to ensure it delivers value.

Join the Debate | Cloud First Summit

Cloudsoft is a sponsor at the upcoming Cloud First Virtual Summit on 23 June.

The conference will bring together senior technologists, Cloud architects and business transformation specialists to explore current trends, new advancements and best practice in Cloud computing.

Tags: business, cloud, digital, Fastly, Internet, IT, outages, technology

Charlotte Binstead

Marketing Manager, Cloudsoft

Site navigation