Event search firm Zvents is releasing a massively parallel database server, based on a published Google design, as an open source project. The new software, Hypertable, is designed to scale to 1000 nodes, all commodity PCs, said Doug Judd, principal search architect for Zvents.
Moving the project from in-house to open source is a way for a relatively small company to get the infrastructure software it needs, Judd says. "We aren't in the database business. this is the kind of infrastructure that should be in open source. This is not company proprietary stuff," he says.
The current Hypertable version is a 0.9 alpha release, and has been tested on about 10 nodes so far, Judd says. But Yahoo developers have expressed in interest in "kicking the tires" and testing on more nodes. Yahoo developers are already involved in another way: Hypertable stores its data on a distributed filesystem, and the database developers are currently using the Apache Software Foundation's Hadoop, which Yahoo supports by employing lead Hadoop developer Doug Cutting and his team and with infrastructure.
The Google database design on which Hypertable is based, Bigtable, attracted a lot of developer buzz and a "Best Paper" award from the USENIX Association for "Bigtable: A Distributed Storage System for Structured Data" a 2006 publication from nine Google researchers including Fay Chang, Jeffrey Dean, and Sanjay Ghemawat. Google's Bigtable uses the company's in-house Google File System for storage.
The API for Hypertable is slightly different from Bigtable's, Judd says. Although it is not a full SQL database, it is more featureful than a simple key/value store such as Brad Fitzpatrick's memcached. Memcached is widely used along with a conventional SQL database in high-traffic web sites, to cache chunks of HTML and XML and save an application from having to query the main database.