A new open source project dubbed StarCluster has been released aiming to simplify the management of virtual clusters hosted on Amazon’s Elastic Compute Cloud (EC2) service.
According to developer Justin Riley, StarCluster minimises the administrative overhead associated with obtaining, configuring, and managing a traditional computing cluster used in research labs or for general distributed computing applications.
The StarCluster project started at MIT’s Software Tools for Academics and Researchers (STAR) Program.
A first beta release, StarCluster 0.90, was posted on the The Python Package Index (PyPI) last week and on Freshmeat.net yesterday.
StarCluster consists of a library and set of scripts that interface with EC2 to automate the creation (and deletion) of clusters of virtual machines and only paying for the time used.
For end-users, the scripts are the main user interface and provide options for getting started with distributed computing on EC2 like starting and stopping clusters, and managing software configurations.
StarCluster also has an API which provides an interface to EC2 for manipulating nodes, executing commands on nodes and copying files among nodes.
A configuration file provided by the user (including EC2 account details) requests cloud resources (number of machines, instance type) from Amazon and to automatically configure the Linux machines with a queuing system, an NFS shared /home directory, password-less SSH access, OpenMPI, and about 140GB of disk space.
StarCluster comes with a public Amazon Machine Image (AMI) on EC2 that includes a the software stack for distributed computing.
The AMI is based on Ubuntu 9.04 (i386 and x86_64) and also includes the Sun Grid Engine software and Python libraries for scientific computing.
StarCluster is targeted at computational research labs and to support classrooms with computational requirements.
“StarCluster is a way for graduate students and faculty to have an on-demand cluster,” according to the project. “This means students can access their research with the same hardware and software configurations wherever they go; even if they move to another institution.”
“It also removes the majority of system administration concerns since the initial setup procedures have been captured in StarCluster and in the user's software configurations. With this model there is also the benefit that if hardware problems occur it's easy to request a new set of machines in the cloud.”
Planned features include support for multiple clusters and the dynamic resizing of EC2 clusters where nodes would be launched, added to the cluster, used for computation, and removed when they're idle.