Weighing the benefits
Even Microsoft would probably admit that despite improved data compression and a resource governor to manage multiple workloads, SQL Server 2008 is not the most intuitive choice for this clustered, 'scaled-out' schema.
"SQL Server 2008 takes us to the next level, but that is within the 'scale-up' model," said Ted Kummert during a conference call with journalists this week launching the upgraded database. Rather, Microsoft's recent acquisition of large data warehouse-focused startup vendor DATAllegro, "will take us to the greatest level of scale-out," he said.
There are several reasons, though, why Pan-STARRS went with SQL Server 2008.
One is cost. Deploying Pan-STARRS will cost just US$750,000, due to the low cost of the PC hardware and the heavy academic discounts offered by Microsoft for SQL Server and Windows Server 2008.
"People in academia are always operating on a shoestring budget, so we wanted to be able to create something others could emulate," Szalay said.
More important, however, is Microsoft's long involvement with the astronomical community, especially via its technical ambassador, Jim Gray. The noted database researcher, who disappeared at sea in early 2007 and is now presumed dead, was instrumental in building predecessor databases, such as TerraServer, a massive free Web archive of satellite pictures of the Earth stored in SQL Server, and the 40TB SkyServer, a similar repository of astronomical images.
Indeed, the distributed database platform that Pan-STARRS (and, it is hoped, other applications) will run on is called GrayWulf in Gray's honor.
"Gray worked with us for more than a decade. All the credit should go to him," Szalay said.
"He changed astronomy as we know it," said Maria A. Nieto-Santisteban, a software engineer at Johns Hopkins and Pan-STARRS' technical lead. "We still ask ourselves, 'How would Jim do this?'"