“If you want to know what the large-scale, high-performance data processing infrastructure of the future looks like, my advice would be to read the Google research papers that are coming out right now,” Mike Olson, the CEO of Hadoop specialist Cloudera, said at recent event in Silicon Valley. According to Charles Zedlewski, vice president of products at Cloudera, the company was already aware of Spanner — after recruiting some ex-Google engineers — and it may eventually incorporate ideas from the paper into its software.
Facebook is already building a system that’s somewhat similar to Spanner, in that it aims to juggle information across multiple data centers. Judging from our discussions with Facebook about this system — known as Prism — it’s quite different from Google’s creation. But it shows that other outfits are now staring down many of the same data problems Google first faced in years past.
The Spanner paper lists many authors, but two stand out: Jeff Dean and Sanjay Ghemawat. After joining Google from the research operation at DEC — the bygone computer giant — Dean and Ghemawat helped design three massive software platforms that would have a major impact on the rest of the internet. MapReduce and the Google File System gave rise to Hadoop, while BigTable helped spawn an army of “NoSQL” databases suited to storing and retrieving vast amounts of information.
Spanner draws on BigTable, but it goes much further. Whereas BigTable is best used to store information across thousands of servers in a single data center, Spanner expands this idea to millions of servers and multiple data centers.
First Let’s Synchronize our Atomic Watches
“The linchpin of Spanner’s feature set is TrueTime.”
To understand TrueTime, you have to understand the limits of existing databases. Today, there are many databases designed to store data across thousands of servers. Most were inspired either by Google’s BigTable database or a similar storage system built by Amazon known as Dynamo. They work well enough, but they aren’t designed to juggle information across multiple data centers — at least not in a way that keeps the information consistent at all times.
According to Andy Gross — the principal architect at Basho, whose Riak database is based on Amazon Dynamo — the problem is that servers must constantly communicate to ensure they correctly store and retrieve data, and all this back-and-forth ends up bogging down the system if you spread it across multiple geographic locations. “You have to a do a whole lot of communication to decide the correct order for all the transactions,” Gross says, “and the latencies you get are typically prohibitive for a fast database.”
Rather than try to improve the communication between servers, Google spreads clocks across its network. It equips various master servers with GPS antennas or atomic clocks, and — working in tandem with the TrueTime APIs — these time keepers keep the entire network in sync. Or thereabouts.
“A lot of current research [outside of Google] focuses on complicated coordination protocols between machines…but this takes a completely different approach,” Gross says. “By using highly accurate clocks and a very clever time API, Spanner allows server nodes to coordinate without a whole lot of communication.”
‘Just having a peek into the way Google does this…is very valuable.’
— Andy Gross
The rub is that you can’t use Spanner unless you add hardware to your servers. In its paper, Google says the atomic clocks aren’t expensive, and 10gen’s Max Schireson can see other outfits installing similar equipment. But both Basho’s Gross and Cloudera’s Zedlewski believe the cost would be prohibitive for general use.