Last week's outage of Amazon's Elastic Cloud Compute, and specifically the company's lack of communications around that outage, point out the importance of transparency when working with a cloud provider.
For the most part, Amazon's communication with customers through the outage was done through its Service Health Dashboard. Aside from that, communications from the company were terribly lacking.
While there's no doubt that job one was to find the problems and get customers back up and running as quickly as possible, Amazon would have done its own name and the name of the cloud as a computing model a great deal of good had they been more transparent with their users and the general public through the process.
But even if it were suddenly filled with information and details on what went wrong and the steps that will be taken to ensure it won't happen again, it's shutting the barn doors after the horses have run free. Amazon should have been actively communicating with its customers and its customers' customers while the battle was still being fought.
In a blog post, Throsten von Eicken of cloud management services provider RightScale calls Amazon's lack of communications through the process "the biggest failure in this event," and offers a list of improvements that would make future outages more bearable.
Moving to the more distributed model of cloud computing is a leap of faith. You're trusting another organization to ensure that your resources are accessible. And when things go wrong, the only thing that's going to maintain that trust is as-it-happens communications that are as transparent as possible.
If you're looking to move to a more cloud-like environment or otherwise outsource your IT infrastructure, there are a lot of things to think about when dealing with a prospective vendor. Making sure the service level agreements (SLAs) meet your needs and expectations is key. But whether you're planning on outsourcing to a cloud giant like Amazon or a local service provider, it's worth taking some time to find out how they communicate with their customers. Particularly when things go wrong.
Look for vendors that are proactive with support and communications, particularly if the workload you're looking at outsourcing is of the mission critical variety. That way at least if things go wrong, you won't be left sitting around wondering what's going on and when you can expect to be back up and running.
There's still a lot to like about Amazon EC2. In many ways, the fact that some downtime is so newsworthy is a testament to the kind of uptime the service has enjoyed. It's like a plane crash being so much more newsworthy than a car accident, largely because air travel is so much safer than being on the road.
But in those rare, unfortunate instances when things do go wrong, you want your airline to keep you informed as quickly as possible while events are still unfolding. You should expect the same kind of transparency for your cloud provider as well.