Internet Suffers ‘Temporary Amnesia’ After Amazon Web Services Experiences Outage

A major Amazon Web Services (AWS) outage on Monday disrupted large swaths of the internet, affecting thousands of high-traffic websites, mobile apps, and critical online services.

Impacted sites and apps included Reddit, Snapchat, Venmo, Fortnite, Roblox, Coinbase, Disney+, and numerous banking and government platforms.

Amazon reported “increased error rates and latencies” starting around 3 a.m. ET that involved its facility in Northern Virginia.Users reported trouble with popular websites and apps including Duolingo and the online games Roblox and Fortnite. Financial service companies like Coinbase, Robinhood and Venmo also reported disruptions, as did the companies that operate the chatbots Perplexity and ChatGPT. Amazon said its main website was affected. United Airlines, Canva, Reddit and Flickr also acknowledged problems with their websites. The Associated Press, NPR and The New York Times’ Games also said they had issues.

The outage began around 3:00 am Eastern Time, originating in AWS’s US-EAST-1 data center region in Northern Virginia, one of its largest and most critical hubs. Amazon’s official health dashboard cited increased error rates and latencies across multiple AWS services due to a Domain Name System (DNS) resolution issue linked to DynamoDB, a core AWS database system.

DNS resolution issues occur when the process of converting a domain name (like “example.com”) into its corresponding IP address fails or becomes delayed. This prevented client systems from correctly referencing and retrieving stored data, described by one expert as the internet suffering “temporary amnesia”.

The company said it “identified the trigger of the event as DNS resolution issues for the regional DynamoDB service endpoints.” It ran into more problems as it tried to solve the outage, but it was eventually able to fix everything. “By 3:01 PM, all AWS services returned to normal operations,” it said.At about 4:30PM ET on October 20, things seemed to be returning back to normal. Apps like Venmo and Lyft, which were either slow to respond or completely nonresponsive before, were appearing to behave smoothly.

There were over 6.5 million reports of connectivity issues from around the world.

Downdetector says it received over 6.5 million global reports of connectivity problems, including 1.4 million+ from the US and 800K+ in the UK.”The lesson here is resilience,” says Luke Kehoe, an industry analyst at Ookla. “Many organizations still concentrate critical workloads in a single cloud region. Distributing critical apps and data across multiple regions and availability zones can materially reduce the blast radius of future incidents.”Those incidents, he says, are probably “becoming slightly more frequent as companies are encouraged to completely rely on cloud services, [but] this kind of outage, where a foundational internet service brings down a large swathe of online services, only happens a handful of times in a year.”

Amazon was also impacted by the outage.

The outage also brought down critical tools inside Amazon. Warehouse and delivery employees, along with drivers for Amazon’s Flex service, reported on Reddit that internal systems were offline at many sites. Some warehouse workers were instructed to stand by in break rooms and loading areas during their shift, while they couldn’t load Amazon’s Anytime Pay app, which lets employees access a portion of their paycheck immediately.Seller Central, the hub used by Amazon’s third-party sellers to manage their businesses, was also knocked offline by the outage.

As of now, everything seems to be back to normal for internet connectivity. However, much of the content remains…questionable.

Tags: Amazon, Technology

CLICK HERE FOR FULL VERSION OF THIS STORY