The Infrastructure Mess Causing Countless Internet Outages
Credit to Author: Lily Hay Newman| Date: Fri, 28 Jun 2019 16:38:21 +0000
In a weeks-long stretch in 2014, hackers stole thousands of dollars a day in cryptocurrency from owners. In 2017, internet outages cropped up around the United States for hours. Last year, Google Cloud suffered hours of disruptions. Earlier this month, a large swath of European mobile data was rerouted through the state-backed China Telecom. And on Monday, websites and services around the world—including the internet infrastructure firm Cloudflare—experienced hours of outages. These incidents may sound different, but they actually all resulted from problems—some accidental, some malicious—with a fundamental internet routing system called the Border Gateway Protocol.
The web is distributed, but it's also interconnected. It needs to be so that data can move around worldwide without all being controlled by a single entity. So every time you load a website or send an email, BGP is the system responsible for optimizing the route that data takes across these sprawling, intertwined networks. And when it goes wrong, the whole internet feels it.
Originally conceived in 1989 (on two napkins), the version of BGP used today remains largely unchanged since 1994. And though BGP has scaled surprisingly well, there's no denying that the internet is very different than it was 25 years ago. In fact, the way BGP was designed introduces risk of outages, manipulations, and data interception—all of which have come to pass.
The internet's backbone routers—massive industrial nodes usually run by internet service providers, not the Linksys at your house—each control a set of IP addresses and routes. ISPs and other large organizations use BGP to announce these routes to the world and calculate paths. Think of it like planning a cross-country drive: You need to know the different route options in each area, so you can stop at all the right corn mazes and the world's largest rocking chair without adding too much extra driving each day. But if your GPS is outdated, you could wind up at a dead end or on a new road that totally bypasses the salt flats.
On the internet, it's crucial for data to get where it's supposed to go, yet BGP hinges on something a little bit slippery: trust. The protocol wasn't designed to independently verify the route claims of individual networks. If these so-called autonomous systems accidentally announce bad routes—or are hijacked to broadcast inaccurate routes—data flows start to back up or reroute in haphazard ways that can lead to connectivity issues. It's like if hackers set up detour signs, or changed street names, to put you on a path to your in-laws' house instead of a waterpark. And if an attacker crafts one of these diversions carefully, they can even potentially control the flow of data to intercept it.
"It’s a protocol that was built with a trust-based mind-set," says Jérôme Fleury, director of network engineering for Cloudflare. "There was no security mechanism at the time; there was pretty much nothing except trust. And it worked actually pretty well for a lot of years. But the main issue right now is you find a lot of bad actors on the internet, and you will find bad actors that can actually operate routers now. And people are also prone to mistakes. So the question is, how do we move the needle from trust-based BGP routing to a framework that has authentication?"
BGP isn't the only historic internet system with trust issues. Another fundamental protocol, known as the Domain Name System, has dealt with similar issues. If BGP is the internet's navigational system, DNS is its address book. DNS hijacking has become a major security issue around the world, and the Department of Homeland Security even issued an emergency directive in January aimed at defending DNS accounts.
As with DNS, though, concerns about BGP date back decades. In 1998, for example, a group of hackers from the L0pht collective famously testified before Congress that they could take down the internet in 30 minutes by attacking BGP. Ten years later, Kim Zetter assessed the state of BGP insecurity in WIRED, writing, "Government and industry officials have known about the problem for more than a decade and yet have made little progress in addressing it, despite the national security implications."
Another decade on, there are thousands of BGP routing incidents each year. And while most are minor and accidental, an increasing number are targeted, malicious attacks. In December, NSA senior adviser Rob Joyce specifically cited BGP insecurity as a pressing threat in international cybersecurity defense.
As a result of this renewed urgency over the past few years, the internet preservation and standards community has begun to make real progress on promoting secure BGP configurations and adding route authentication. In 2017, the National Institute of Standards and Technology collaborated with DHS to develop a set of routing defense standards published by the Internet Engineering Task Force. Last year, researchers published a BGP hijacking defense framework for network operators funded by the National Science Foundation, DHS, and the European Research Council. And since 2014, a growing consortium of network operators and the Internet Society have been codifying and promoting BGP best practices through the Mutually Agreed Norms for Routing Security, or MANRS. Perhaps most important, the community has encouraged adoption of a tool to cryptographically confirm the validity of BGP routes, known as Resource Public Key Infrastructure.
Even with all of these initiatives gaining momentum, it remains difficult to get every ISP and network operator to implement those changes. But many major players are at least onboard with implementing route filtering and RPKI, like AT&T, the Swedish infrastructure group NetNod, the massive Japanese telecom NTT Communications, and Clouflare itself.
"There was no security mechanism at the time, there was pretty much nothing except trust."
Jérôme Fleury, Cloudflare
The patchwork problem was on full display with the Cloudflare incident this week. Pennsylvania steel company Allegheny Technologies uses two internet providers for connectivity. It received accidental, inaccurate routing information from one provider, a small Midwest ISP, and unintentionally passed it on to its other provider, Verizon. The smaller ISP started the routing error, but Verizon—an internet backbone behemoth with massive resources—also had not implemented the BGP filters and authentication checks that would have caught the mistake. Without these protections in place, Verizon's other customers worldwide, including Cloudflare, experienced outages and failures. Verizon did not return a request for comment about the incident.
"ATI’s operations were not at cause nor were they impacted," an Allegheny spokesperson told WIRED. "We look forward to Verizon’s rapid resolution of the issue on behalf of all users."
If Monday's incident had been caused by hackers, they could have used the cascade of outages as a denial of service attack—blocking users access to popular sites and web services around the world. In other types of BGP attacks, seemingly like the one on European telecoms earlier this month, an attacker can strategically reroute traffic for surveillance or intelligence-gathering. This is why BGP defenses are so crucial. They not only protect against service interruptions caused by mistakes, they also safeguard basic security and privacy on the internet.
"It’s a testament to the brilliance of these early Internet protocols, including BGP, that we’ve been able to scale them far beyond their intended design criteria," says Roland Dobbins, principal engineer at the network security firm Netscout Arbor. "Nonspecialists kind of view the internet as this high-tech, gleaming thing like the bridge of the starship Enterprise. It’s not like that at all. It’s more like an 18th-century Royal Navy frigate. There’s a lot of running around and screaming and shouting and pulling on ropes to try to get things going in the right direction."