9 min read
9 min read

That Monday morning, your phone seemed to have a mind of its own. From Snapchat to Venmo, countless apps and websites suddenly stopped working, leaving millions confused and disconnected.
The culprit was a major failure at Amazon Web Services, a behind-the-scenes tech giant. The outage exposed how concentrated critical systems have become and how customers’ architecture choices are.
Relying on a single cloud region or provider can multiply the impact when a major provider has problems.

Think of Amazon Web Services, or AWS, as the internet’s invisible powerhouse, not just part of the online shopping giant. It’s a massive cloud computing platform that provides the digital backbone for countless companies.
When you send a snap or order food, there’s a good chance AWS is working in the background. This setup is efficient and cost-effective for businesses, allowing them to focus on their services. However, it also means that when AWS has a problem, the effects can ripple across the globe instantly.

The trouble began early on October 20, 2025, in AWS’s US-EAST-1 (northern Virginia) region. An internal DNS-resolution problem affected DynamoDB’s service endpoints.
DynamoDB is AWS’s No-SQL database service, which prevented many apps from finding the data they needed and triggered cascading failures.
Initially, the problems seemed isolated to a few services. Engineers were quickly alerted and began working to diagnose the mysterious glitch. They soon realized this was not a small local issue but the start of a cascading failure that would soon impact millions.

Within hours, the outage spread like a digital wave, taking down a stunning variety of popular services. Social media platforms like Snapchat and Reddit became unusable, while payment apps like Venmo and Coinbase froze up.
Gamers found themselves locked out of Roblox, and streaming on Prime Video was interrupted. This widespread disruption showed that no corner of the internet was safe, from entertainment and social connections to essential financial tools and food delivery services.

The disruption wasn’t just about entertainment; it seriously hampered productivity and education. Workplace communication tools like Slack and Zoom experienced severe issues, making it difficult for teams to collaborate.
Students and teachers also faced major hurdles as the learning platform Canvas went offline. This outage prevented access to virtual classrooms, assignments, and course materials, highlighting how deeply cloud services are woven into our professional and educational lives.

The outage’s reach extended into critical services, causing significant inconvenience. Major UK banks like Lloyds and Halifax displayed error messages, preventing customers from accessing their accounts online. This left many people unable to check their balances or make digital payments.
Travelers also felt the impact, with airlines like Delta and United reporting website and app issues. Some passengers found themselves unable to check in for flights online or view their reservations, creating stress and potential delays at airports during the busy morning rush.

The initial problem triggered a domino effect within AWS’s own complex systems. An internal network responsible for monitoring server health began to fail because it also depended on the broken database.
To prevent a total collapse and allow for recovery, AWS had to deliberately slow down or throttle certain operations. This careful balancing act was necessary to stabilize the wobbling system, but it also prolonged the recovery time for many dependent services.

Outage tracker Downdetector logged more than 6.5 million user reports worldwide during the incident; per-country totals fluctuated across the day as the spike moved, so exact national breakdowns depend on the snapshot used.
Countries like Australia, the Netherlands, France, and Japan also saw hundreds of thousands of complaints. This global footprint demonstrated the immense scale of AWS’s infrastructure and how a single failure in one region can create an international incident.

Fixing a crisis of this scale was a complex and slow process, not a simple flip of a switch. Amazon engineers worked tirelessly, applying fixes and carefully monitoring the system’s response. They provided frequent public updates on their status page.
Even after the core technical issue was resolved, many services faced a backlog of delayed requests. This meant that for many users, apps and websites remained slow or unreliable for several more hours as the digital world slowly cleared its congested pipelines.

Authorities quickly confirmed this massive disruption was not the result of a cyberattack. Cybersecurity firms and experts clarified that there was no evidence of malicious activity or a security breach.
AWS said the incident stemmed from DNS-resolution problems that affected DynamoDB endpoints and internal AWS networking/monitoring automation.
In short, an automated DNS/monitoring routine created faulty records that failed to self-repair, causing many services to be unable to locate the servers they needed.

Internet monitoring firm Catchpoint’s CEO warned the economic fallout could run very high. He suggested it might reach into the ‘hundreds of billions.’
But that figure is an estimate, not an audited total; other analysts offered lower, still-substantial hourly or sectoral cost estimates, and the true total will take time to calculate.
From paused factory lines to disrupted airline operations and lost sales for online stores, the economic ripple effect was profound. The incident served as a stark reminder of the immense economic value that flows through our digital infrastructure every single minute.

This wasn’t the first time this particular AWS data center in Virginia, known as US-EAST-1, has been the epicenter of a major outage. It also experienced significant failures in 2021 and 2020, causing similar widespread internet disruptions.
As AWS’s oldest and largest region, many services are defaulted to it, creating a concentration of risk. This repeated issue raises questions about the resilience of this specific infrastructure and the wisdom of relying so heavily on a single point of failure.

As much of the internet struggled, Elon Musk took to his social media platform, X, to gloat. He pointed out that his service was unaffected and used the moment to promote its features. Musk took several jabs at Amazon and its founder, Jeff Bezos, while his platform remained online.
During the outage, Elon Musk posted on X that his platform was unaffected and used the moment to promote its messaging features; he also publicly criticised Signal, writing that he ‘doesn’t trust Signal anymore.’ Those comments were widely reported.

The outage served as a stark wake-up call about our collective dependence on a small number of tech giants like Amazon, Google, and Microsoft. One expert described it as a democratic failure, emphasizing how a single company’s problem can silence media, disrupt secure messaging, and halt essential services.
Many experts argued that this should be a catalyst for diversifying the digital ecosystem. The urgent call is for more companies to spread their services across multiple cloud providers to build a more resilient and robust online world for everyone.

Technically, AWS provides tools that allow developers to build apps that can withstand a failure in one data center. However, this often adds cost and complexity, leading some companies to cut corners for speed and savings.
The responsibility lies with individual companies to design their services with failure in mind. Building genuine redundancy, where systems can automatically switch to a backup provider, is key to surviving such outages without going completely offline.

Amid the frustration, there was one piece of good news. Cybersecurity experts confirmed that this was a system failure, not a data breach. Your personal information, messages, and photos stored on affected apps were never at risk of being stolen or exposed.
The issue was that apps suffered from temporary amnesia, unable to locate the data they needed to run. Your information remained safely stored on servers, but the apps simply couldn’t find it or connect to it for several hours, which is why everything just froze.
Wondering what caused this chaos? See how Amazon and Fortnite were hit by a major AWS outage and what it means.

So, what’s the takeaway from this massive digital hiccup? For individuals, it’s a reminder to occasionally back up important data offline and have alternative ways to complete tasks. For the tech industry, it’s a powerful lesson in the risks of putting too many eggs in one basket.
The goal is not to abandon cloud services, which power incredible innovation, but to build them better. The hope is that this outage will push companies to create a more distributed and resilient internet, one that can better withstand the inevitable next glitch.
Want to see how the industry is already adapting? Check out what happened when AWS and Microsoft paused their data centers.
Did this outage throw your day into chaos, or did you barely notice? We want to hear your story. Drop a comment below and let us know how it impacted you.
Read More From This Brand:
Don’t forget to follow us for more exclusive content right here on MSN.
This slideshow was made with AI assistance and human editing.
This content is exclusive for our subscribers.
Get instant FREE access to ALL of our articles.
Dan Mitchell has been in the computer industry for more than 25 years, getting started with computers at age 7 on an Apple II.
We appreciate you taking the time to share your feedback about this page with us.
Whether it's praise for something good, or ideas to improve something that
isn't quite right, we're excited to hear from you.
Stay up to date on all the latest tech, computing and smarter living. 100% FREE
Unsubscribe at any time. We hate spam too, don't worry.

Lucky you! This thread is empty,
which means you've got dibs on the first comment.
Go for it!