How Facebook Catches Bugs in Its 100 Million Lines of Code
Credit to Author: Lily Hay Newman| Date: Thu, 15 Aug 2019 21:03:59 +0000
Facebook doesn't have the most stellar privacy and security track record, especially given that many of its notable gaffes were avoidable. But with billions of users and a gargantuan platform to defend, it's not easy to catch every flaw in the company's 100 million lines of code. So four years ago, Facebook engineers began building a customized assessment tool that not only checks for known types of bugs but can fully scan the entire codebase in under 30 minutes—helping engineers catch issues in tweaks, changes, or major new features before they go live.
The platform, dubbed Zoncolan, is a "static analysis" tool that maps the behavior and functions of the codebase and looks for potential problems in individual branches, as well as in the interactions of various paths through the program. Having people manually review endless code changes all the time is impractical at such a large scale. But static analysis scales extremely well, because it sets "rules" about undesirable architecture or code behavior, and automatically scans the system for these classes of bugs. See it once, catch it forever. Ideally, the system not only flags potential problems but gives engineers real-time feedback and helps them learn to avoid pitfalls.
"Every time an engineer makes a proposed change to our codebase, Zoncolan will start running in the background, and it will either report to that engineer directly or it will flag to one of our security engineers who’s on call," says Pieter Hooimeijer, a security engineering manager at Facebook. "So it runs thousands of times a day, and found on the order of 1,500 issues in calendar year 2018."
"It is by far the most valuable in the identification of known exposures. However, it doesn’t cover everything."
David Kennedy, TrustedSec
Static analysis tools don't find new types of vulnerabilities on their own; they can only catch things based on the rules they've been directed to follow. But they're a useful workhorse for catching the same types of mistakes again and again, or retroactively pulling out a set of bugs from a single new rule. They're also nowhere near unique to Facebook; static analysis tools are widely used in the security community and broader development industry. But Hooimeijer notes that Zoncolan is especially effective, because it is custom-built to comprehensively map Facebook's specific code. Hooimeijer says that before Facebook disclosed in March that it had accidentally stored hundreds of millions of user passwords in plain text, the company fed a rule about the bug into Zoncolan to scan the codebase for similar issues that could be lurking. And found a few.
"Four years ago we would have had to scramble a bunch of security engineers all at once to start combing the code manually looking for additional issues," Hooimeijer says about the incident. "Instead, we used Zoncolan to ensure there were no additional issues in our code base that were similar in nature. In this case we created new rules that found similar issues in practice." Inspiration for new rules that expand Zoncolan's detection capabilities come from a number of sources within Facebook, including the company's bug bounty program.
Zoncolan has a particularly tailored approach to hunting security bugs, versus more general static analysis tools that look for a broad array of design and performance bugs. It also focuses on recognizable data flows and patterns, as a way of cutting down on the false positives typical of static analysis. Still, Facebook's not the only company to customize a system to its liking; Google has its own custom-built static analysis tool as well, evaluating the company's enormous 2 billion line codebase.
"Any company that has a good software development life cycle has source code analyzers to ensure they weed out exposures prior to moving into production," says David Kennedy, CEO of the corporate incident response consultancy TrustedSec. "Most mature organizations leverage static code analyzers, because it is by far the most valuable in the identification of known exposures. However it doesn’t cover everything."
Kennedy points out that a tool like Zoncolan would not have spotted the permission issues that led to Facebook's 30 million account data breach in September. "A source code analyzer would not have found that," he says. And many of Facebook's most serious issues over the last few years have been policy-based privacy problems unrelated to accidental code bugs.
Hooimeijer echoes that Zoncolan is not a silver bullet. But he says that given the investment Facebook has made in the platform, he hopes that a version of the tool will someday be available as an open source static analyzer for other organizations to use. The attributes that make Zoncolan so effective for bug hunting within Facebook's code could generalize into a broadly useful tool. But in an open source version meant to run outside of Facebook, the company would also need to build in flexibility for more diverse environments.
One step toward this goal is a code checker called Pyre that Facebook released open source in 2018 for the popular coding language Python. The tool doesn't have the full scope and security focus of Zoncolan, but is an example of the types of resources Facebook plans to release.
"We've invested a lot of effort into building this, so that's the trajectory: Zoncolan but for Python," he says. "We want to share the awesomeness outside Facebook as well."
The security community will always welcome another high-quality, open source tool. But Facebook needs to keep honing every defense at its disposal to catch user security issues before they snowball.