Global Six Hour Facebook, Whatsapp and Instagram Outage

Facebook had a six-hour outage, which prevented users from refreshing their feeds or sending messages, was caused by “configuration changes on the backbone routers,”. They did not specify exactly what the changes were.

The changes caused “issues” that interrupted the flow of traffic between routers in Facebook’s data centers around the world.

This was longest downtime for Facebook since 2008, when a bug knocked the site offline for about a day, affecting about 80 million users. The platform currently has around 3 billion users.

Security experts identified the problem as a Border Gateway Protocol (BGP) withdrawal of the IP address prefixes in which Facebook’s Domain Name servers were hosted, making it impossible for users to resolve Facebook and related domain names, and reach services.

Cloudflare reported that at 15:39 UTC, Facebook made a significant number of BGP updates, including the withdrawal of routes to the IP prefixes, which included all of their authoritative nameservers. This made Facebook’s DNS servers unreachable from the Internet. By 15:50 UTC, Facebook’s domains had expired from the caches in all major public resolvers. A little before 21:00 UTC, Facebook resumed announcing BGP updates, with Facebook’s domain name becoming resolvable again at 21:05 UTC.

On October 5, Facebook’s engineering team posted a blog post explaining the cause of the outage. During maintenance, a command was run to assess the global backbone capacity, and that command accidentally disconnected all of Facebook’s data centers. While Facebook’s DNS servers ran on a separate network, they were designed to withdraw their BGP routes if they could not connect to Facebook’s data centers, making it impossible for the rest of the internet to connect to Facebook.