Вы используете устаревший браузер Установите более современный ¯\_(ツ)_/¯
Share
Digital trends
21.10.2021

The fall of Facebook: Reflections on a (preventable) disaster

On October 6 2021, The Wall Street Journal ran an article on Facebook management’s plans to suspend a series of scheduled updates and projects. Two days before that, on October 4, the platform experienced its most serious outage since 2008. On that very same day, a former Facebook employee spoke in the US Congress, accusing the social network of neglecting the security of users in favor of profits. These events have prompted serious questions about the way the company operates, as well as the internal infrastructure it operates on. SLDDigital.com invited information security experts to weigh in on the factors that led to this colossal outage.
 

So, why did Facebook fail so hard?

Well, this all started with a change in the BGP (Border Gateway Protocol) settings on the backbone routers that control the traffic moving between data centers. This led to a snowballing disruption in Facebook data center’s connectivity with the rest of the global DNS network. The failure also impacted the exchange of information between Facebook's own DNS registrars, so WhatsApp and Instagram stopped working, too. It was as if someone had pulled the plug on the entire data center from the network.
 
 “There was a failure in the DNS servers. Imagine that your phone has a phone book which contains the first and last name of the contact, but not the phone number itself. So, when you try to call a subscriber, the system just doesn’t understand where to go", said Alexander Dvoryansky, Director of Communications at InfoSecurity
 
The BGP router change that caused the failure is detailed in the blog at Cloudflare, a global provider that provides CDN services, DDoS protection, secure access to resources and DNS servers, as well as services to optimize app performance.
 

Why did it take so long for Facebook to sort out the outage?

So, the failure led to a disruption in the performance of internal information systems and communication systems. This meant that employees, most of whom were working remotely, couldn’t connect to the infrastructure or get in touch with their colleagues. Key network engineers were also working remotely. This significantly complicated things.
 
In the first hours of the blackout, a message appeared on Facebook from a New York Times reporter saying that Facebook employees couldn’t enter the buildings to assess the severity of the problem as their passes for doors and checkpoints had stopped working. As a result, it took over six hours for them to deal with this massive failure.
 
Mikhail Malyshev, who leads the team at Softline which develops information security solutions, believes that it was organizational, rather than technical, reasons that slowed things down:
 
“There were several factors that affected this situation:
 
  1. The lack of a competent on-call service inside the data center to solve the problem on the spot.
  2. The lack of a backup communication channel with the data center’s infrastructure, which would have meant that the problem could have been solved quicker, even remotely.
  3. Disabling the data center access system, the authorization of which was tied to the Facebook domains that were missing from the network at that time".
 

The exploding bomb effect

Following the events of October 4, Facebook management decided to suspend the release of a series of updates. According to The Wall Street Journal, the reason for this cautious behavior is the investigation of The Facebook Files, which are based in part on testimony from Frances Haugen, a former employee of Facebook Inc.
 
“The company’s management knows how to make Facebook and Instagram more secure, but they aren’t making the necessary changes,” Frances Haugen said at a hearing in the US Congress on October 5, 2021. She went on to openly accuse Instagram of harming the psychological health of children and adolescents, and talked about how Facebook perpetuates modern slavery and ethnic conflict. Facebook's algorithm, introduced in 2018, was also publicly condemned. The social network tends to add fuel to flames during user debates on popular posts, bumping up the post’s visibility, with the posts of ordinary users taking second place to bloggers and influencers. This, apparently, made the network look “evil”. “Facebook's engagement-based rating system not only promotes harmful and overly-engaging content, she said, but “literally incites ethnic violence”, citing examples such as Ethiopia.
 
After several public hearings in Congress, lawmakers began a discussion on tougher regulations for tech corporations in general, and for Facebook in particular. They have put forward a number of legislative proposals, including bills that will force companies like Facebook to be more transparent about the spread of disinformation and other malicious content. “This study has a real bomb-detonating effect,” said Senator Richard Blumenthal, a Connecticut Democrat who presided over the hearing.
 

How can businesses minimize risks?

An experienced CIO will always do their best to minimize the chance of disruption on the level seen with Facebook. Here’s a list of measures that you can, and should, take to avoid a disaster like this:
 
  1. Hire experienced engineers and network architects;
  2. Don't keep all systems under the same domain;
  3. Ensure you have backup communication channels, and an access control system;
  4. Conduct regular drills to prepare for different scenarios.
 “This unintentional system shutdown due to configuration errors was an accident. It’s worse when it’s deliberate – as the result of targeted attacks by hackers, for instance. The damage can be an order of magnitude greater", Mikhail Malyshev said.
 

tags

we recommend
Business Process Automation with CREATIO for a Public Sector Organization

Business Process Automation with CREATIO for a Public Sector Organization

The outlook for WFH

The outlook for WFH

Chairman of Softline to RBC: "Zoom and Teams have become two new countries on the global map"

Chairman of Softline to RBC: "Zoom and Teams have become two new countries on the global map"

You have a product, we have a merchant, or How to organize online sales around the world

You have a product, we have a merchant, or How to organize online sales around the world

We use cookies Cookie

Продолжая использовать данный веб-сайт, вы соглашаетесь с тем, что группа компаний Softline может использовать файлы «cookie» в целях хранения ваших учетных данных, параметров и предпочтений, оптимизации работы веб-сайта.