This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter’s approach.
While cloud computing has proven to be beneficial for many organizations, IT departments have been slow to trust the cloud for business-critical Microsoft SQL Server workloads. One of their primary concerns is the availability of their SQL Server, because traditional shared-storage, high-availability clustering configurations are not practical or affordable in the cloud.
Amazon Web Services and Microsoft Azure both offer service level agreements that guarantee 99.95% uptime (fewer than 4.38 hours of downtime per year) of IaaS servers. Both SLAs require deployment in two or more AWS Availability Zones or Azure Fault Domains respectively. Availability Zones and Fault Domains enable the ability to run instances in locations that are physically independent of each other with separate compute, network, storage or power source for full redundancy. AWS has two or three Availability Zones per region, and Azure offers up to 3 Fault Domains per “Availability Set.”
This arrangement guarantees that at least 99.95% of the time at least one of the locations (Availability Zones or Fault Domains) will be operational. In the event of a failure of one location, a load balancer will redirect traffic to the instances in the other location.
For web servers and other non-transactional applications this can be sufficient for high availability. However, simply redirecting clients to a different instance of SQL does nothing to address the fact that each instance will now have a different data set. Something needs to be done to ensure that the data remains in sync between the SQL instances and that client redirection is done seamlessly with minimal downtime.
For a company experiencing downtime in their Microsoft SQL Server and other important application environments, the modest service fee refunds (a 10% refund for falling short of 99.95 percent uptime, and a 25-30% refund for falling short of 99% uptime) may be of little consolation in the event of a cloud outage. According to analyst firm CloudHarmony, Amazon EC2 and Amazon EBS combined had 46 outages ranging from 19 seconds to 2.8 hours from mid-June 2014 to mid-June 2015. Microsoft Azure Virtual Machines and Object Storage experienced 242 outages ranging from 10.4 minutes to 13.16 hours during the same period.
High availability in cloud environments
To make the cloud practical for business critical applications, you need a way to mitigate downtime using high availability (HA) protection – traditionally failover clusters. In a failover cluster, two or more servers are configured with shared storage (typically a SAN). In the event of a failure on the primary server, software such as Windows Server Failover Clustering moves the application operation to the secondary server. Since both servers share storage, operation can continue without data loss. Seamless failover/failback also enables software updates and patches to be installed while minimizing downtime associated with planned maintenance.
The problem is: in most cloud environments, including in both AWS and Azure, cluster-aware shared storage is not available. This gives DB administrators two basic options: keep Microsoft SQL Server and other critical applications r on-premise or add replication software to create a SANLess cluster in the cloud.
Clusters for High Availability within the cloud
SANless clusters offer a simple, highly efficient way to implement a failover cluster in a cloud. You simply use purpose built SANless clustering software or add it as an ingredient to your Windows Server Failover Clustering environment. The software uses efficient replication to synchronize storage in two or more servers (physical, virtual, or cloud).
By continuously synchronizing the data from primary to remote storage using real time, block-level replication, the storage appears to WSFC as a traditional SAN regardless of the type of storage or where it is located. SANless clustering software is designed to be storage agnostic; that is, it is capable of working with the local or direct-attached storage normally used in public clouds, as well as with storage area networks (SANs), iSCSI storage and network-attached storage.
Of significance in HA cloud configurations, synchronization software also handles write acknowledgements in a way that assures satisfactory performance over a WAN link to an Availability Zone or Fault Domain in a distant datacenter. Some solutions even offer data compression and advanced bandwidth management techniques to further improve WAN performance.
Being agnostic to storage systems also facilitates use of hybrid cloud configurations where, for example, a cluster protecting SQL applications in an enterprise data center using a SAN can be extended to a cluster node in a cloud. This configuration provides a cost-efficient DR option without the cost and complexity of managing your own secondary data center.
Companies can use SANless clustering software solutions that are fully integrated with WSFC, enabling them to implement them in a cloud without the need for specialized training or changes to standard IT operations. Other SANless clustering software can be used to support Linux (as well as Windows) environments where it monitors the complete application stack, manages application failover, and synchronizes storage. It enables complete configuration flexibility and provides a simple, cost efficient HA and DR solution where traditional clusters are impossible or impractical.
The ability to leverage the familiar and proven Windows Server Failover Clustering technology in both Amazon Web Services and Microsoft Azure clouds makes SANless clustering software an affordable solution that is worth considering for SQL Server and other business critical applications.
Bermingham is a high availability expert and has been honored by his peers by being elected to be a Microsoft MVP in Clustering since 2010. His work as Technical Evangelist at SIOS Technology has him focused on Microsoft high availability and disaster recovery solutions, as well as providing hands on support, training and professional services for cluster implementations.