"Everything fails, all the time," so says Amazon.com CTO Werner Vogels.
Amazon Web Services itself experienced a much publicized four-day service disruption last April, another outage in August and it had plenty of company from other cloud service companies last year. Microsoft's Windows Azure cloud platform in February had downtime problems due after the company failed to account for Leap Day, and despite improvements by cloud providers to minimize future outages, more outages will inevitably happen this year and beyond.
Here are steps experts say enterprise IT shops should take to avoid cloud outages from knocking them out:
1) With AWS, use multiple availability zones.
Amazon Web Services offers "availability zones" (AZ) in each of its regions and for each of its services. The company describes AZs as each running on its own physically distinct, independent infrastructure. "They are physically separate, such that even extremely uncommon disasters such as fires, tornados or flooding would only affect a single Availability Zone." During last year's outage, about 45% of customers who used only a single AZ for the Relational Database Services were impacted, compared to less than 3% of customers who used a multi-AZ approach, AWS said in a post mortem report. After last year's outage the company made it easier for customers to use a multi-AZ approach by allowing common design and APIs to distribute instances across AZs.
2) With AWS, use multiple regions.
AWS has a network of eight regions including: US East (Northern Virginia), US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), South America (Sao Paulo), and AWS GovCloud. For extra security and protection beyond a multi-AZ approach, users can place workloads in multiple regions. It's not quite as easy as putting workloads in multiple AZs though, as separate APIs calls are needed for the different regions.
3) Use multiple cloud providers.
Still don't feel protected even with a multi-AZ, multi-region approach? Use multiple cloud providers then, advises Drue Reeves, a Gartner cloud analyst. This comes with caveats as well, since some service providers share common data center resources. Reeves says customers can check with individual providers to see if they are sharing resources with any others that the customer may be using.
4) Outline availability in SLAs.
Beyond taking technical measures, customers can take nontechnical steps, such as negotiating with their cloud service provider regarding service-level agreements (SLA) that specify penalties to be paid in the case of a disruption. If a customer is using a cloud provider for disaster recovery services, the SLA might mandate as much as 99.999% availability.
5) If you can't take the heat, stay away from the fire.
If a user is extremely concerned about high availability of data and applications in the cloud, Steve Hendrick, an IDC analyst, says perhaps that means the customer isn't ready for a public cloud. Hendrick says it's a simple equation: The more mission critical the data and compute resources are, the more protections for resiliency and high availability the customer should put in place.
Network World staff writer Brandon Butler covers cloud computing and social media. He can be reached at BButler@nww.com and found on Twitter at @BButlerNWW.