Although somewhat late to the market for cloud computing infrastructure, OpenStack enjoys an advantage over other cloud stacks in that it has a modular architecture, said one of the first developers of the open-source cloud software.
"This is very important. There is no one way to do OpenStack, and this is very important," said Chris Kemp, who oversaw the development of the OpenStack cloud controller when he was CIO of the NASA Ames Research Center. Kemp spoke Thursday at the OpenStack Conference in Boston.
The developers behind OpenStack often tout its ability to scale beyond the limits of other platforms. "We're not talking about [using OpenStack to run a] cloud of 100 servers or even 1,000 servers, but tens of thousands of servers. Other options out there aren't really considering that scale," said Jonathan Bryce, chairman of the OpenStack Project Policy Board.
For Kemp's team at NASA, scalability was an essential feature. The center needed an infrastructure that could handle the millions of Internet visitors who wanted to view NASA's large collection of space imagery, Kemp explained, in an interview with the IDG News Service.
Originally, Ames tried using the Eucalyptus open-source software project platform, but found challenges in scaling the software to the required levels. Kemp's team even submitted code improvements to the Eucalyptus project team -- then headquartered at the University of California, Santa Barbara -- that they thought would help. Few of those changes, however, were incorporated into the software
So, Kemp formed a team of about 20 engineers and developers who built their own cloud controller, called Nova. Nova caught the interest of Rackspace, which was then building the first iteration of OpenStack.
"The major difference of the architecture for Nova is that it wasn't a monolithic product where everything was bolted together. If you were trying to do something different you would basically have to rip the whole thing apart," Kemp said.
With this architecture, if there is a bottleneck somewhere in the system, the component responsible for that bottleneck could be easily replaced with other components.
Every cloud platform "will encounter a bottleneck in the different part of the infrastructure, depending on the workload, topology," or some other factor, Kemp said. OpenStack has been engineered so that "when you have one of these bottlenecks, you can pull out that piece and plug in something else."
"OpenStack has really been designed as a cloud platform, so every piece can operate as a stand-alone component or plugged in," he said. The storage component to OpenStack is a stand-alone project called Swift. The networking duties are handled by another component, called Quantum.
The plug-in architecture also encourages a wide number of companies to provide various components for the stack, Kemp noted. They then can compete against one another to provide the best implementations or carve out a segment of the market for special-use implementations.
Kemp himself now leads Nebula, a startup that plans to offer an OpenStack appliance for coordinating lots of individual generic servers as a unified cloud service. Nebula is also developing a new dashboard for the stack that can also be easily incorporated into the core code base.
Another contributor to OpenStack is Dell, which offers an OpenStack-based hardware set. Dell developed and released an OpenStack installer, called Crowbar, after its engineers found that the installation process was a difficult one, said John Igor, Dell's executive director for the cloud and big data, during another talk.
Other gaps with OpenStack still need to be addressed as well, perhaps with additional modules. Tim Bell, IT manager for the CERN European Organization for Nuclear Research, noted a few in a talk explaining how CERN is testing OpenStack to better handle its immense workloads.
Bell noted that OpenStack's Nova needs a way of scheduling work nearest to the data needed for that work. Otherwise, any benefits gained by cloud computing would be lost by the additional network bandwidth required to move the data back and forth.
Other refinements are needed as well, he added. Administrators need a way of choosing which types of jobs should be scheduled first, should there be a backlog of jobs. More controls for billing, availability and performance monitoring are needed as well.