From magnifying glasses to mega-storage
Astronomers first began storing data digitally in the mid-1970s, shortly after they began replacing conventional photographic plates with digital camera technology.
Efficiency-wise, digital cameras were still a vast improvement over those photographic plates, which required astronomers to hunch over them with magnifying glasses, counting galaxies and stars. But the digital image resolution back then left something to be desired -- just 260,000 pixels.
Data storage was also crude. Image data was and is still stored in a low-level format based on 80 character-long punch cards. But the flat files used to store the data proved difficult to search and otherwise manipulate.
Gray guided the building of SkyServer, which holds 100 billion rows of data and a million distinct IP addresses, and serves 10-15,000 professional astronomers as well as countless schoolchildren who use SkyServer to complete astronomy reports.
Pan-STARRS, which Gray helped conceive, will be far larger, containing, by the end of 2010, 300TB of data, with some individual tables as large as 20 TB, Szalay said. The repository will include data on more than 140 billion cosmic objects and 5.5 billion actively tracked ones.
Though Pan-STARRS won't use up all 1PB of storage for many years, it will still rank as one of the world's largest databases.
As a clustered system, the data will be partitioned, with a separate names database serving as the index. Since most cosmic objects don't have names such as Earth or Alpha Centauri, most searches will be done via a graphical interface that, according to Szalay, "looks and feels a lot like MapQuest or Google Maps."
Besides looking up data on individual stars or galaxies, Pan-STARRS will also be used to do some deep data mining -- astronomical intelligence, if you will. For instance, Szalay hopes to import old astronomical data from the pre-digital age and run the information through a spatial cross-matching engine in order to create a master database that links all past and present data about every single star or planet.
Pan-STARRS will also serve as a cloud database for outside astronomers, who will be allowed to remotely run queries and store results within Pan-STARRS. An initial difficulty, Nieto-Santisteban acknowledged, is that most astronomers are used to writing applications in C++, not SQL.