Disruptive innovation in infrastructure is on the rise, and nowhere is that more evident than in the Software Defined Networking movement. But while much of the SDN discussion has focused on the data center, the better initial use case might be in the wide area network. One advocate of that approach is Michael Elmore, IT Senior Director of the Enterprise Network Engineering Infrastructure Group at Cigna, a global health service company headquartered in Bloomfield, Connecticut. Michael is also on the board of Open Network Users Group (ONUG). Network World Editor in Chief John Dix asked Elmore to participate in an email-based Q&A to explore the promise of Software Defined WANS.
The members of the Open Network User Group that you are a member of have voted the WAN as the top use case for SDN twice in a row now. Why do you think that is?
Consider this quote from a Wall Street firm at the recent Open Networking User Group meeting in New York: "Although much of Wall Street has focused on the 'sexy' datacenter aspect of SDN, interest in software-defined WAN has increased meaningfully and we believe SD-WAN could experience more rapid adoption than datacenter overlay technologies. SD-WAN can dramatically reduce the cost of WAN deployments by enabling cheaper bit rates in both CAPEX and OPEX (i.e., less cost for the same bandwidth or more bandwidth for the same cost as compared to MPLS) and less overprovisioning for the same SLAs."
What's more, the WAN tends to be more discreet in terms of organizational teams and the technology stack itself, meaning organizations can move faster to embrace SD-WANs. So, if you're interested in building a WAN that is better, faster and cheaper, there are some key issues to consider.
What WAN issues today would encourage a company to start exploring SD-WAN options?
There are many challenges and limitations with the predominant MPLS-based layer 3 VPN service offerings that have become the standard connectivity solution for many Fortune 500 companies over the past 15 years. Although these solutions have served the enterprise well in a time of limited options, the market is opening up and ripe for transformation.
Previous attempts to scale VPN overlays have not found their way to mainstream, due to protocol scalability limitations and the sheer configuration complexity required for a reasonably sized enterprise network. As more and more critical business applications -- such as voice, contact center and storage applications -- converge to an IP transport, a high-performing and ultra-resilient (self-healing) IP WAN fabric will become essential to the business.
Let's examine the WAN challenges today:
- The access cost component for MPLS services provided by Tier I service providers continues to be a challenge. Global and national providers are at the mercy of their wholesale relationships with the local exchange carriers and tend to pass these costs to the consumer, with a potential mark-up. Additional cost components include everything from the number of routes, multicast support and QoS requirements, all of which further inflate costs.
- It seems the MPLS provider's control plane and forwarding information base (FIB) tables are hitting scale limitations, causing providers to police the number of routes they are willing to accept from a customer. For the enterprise, this means more front-end negotiation, risk of hitting these policed thresholds, and ultimately the risk in dropping routes, as well as the cost the SPs incur (and potentially pass on) with the constant churn of hardware and perpetual maintenance to support the increased demand in Provider Edge (PE) and backbone capacity.
- WANs today are not application aware, nor do they consider different application performance thresholds. Soft failures/regional brown-outs can have catastrophic impact on real-time applications.
- SLAs are only as good as the customers' ability to measure these and hold the providers accountable. Whether it's latency, jitter, packet loss or the absolute number of outages allowed per month, all of this requires significant management overhead. Although sourcing teams and enterprise service owners are focused when negotiating a predetermined financial penalty for a specific SLA breach, often these breaches render more material impact to the business, which cannot be compensated by collecting an SLA credit. How does an enterprise protect its net promoter score for a customer call they may have dropped, due to a regional outage?
- Service provider maintenance is sometimes uncoordinated, resulting in unplanned business impact.
- Time to detect failures and restore service is often elongated. Both hard down and soft failure detection requires synchronization between the service provider's control plane and the customer's control plane (bifurcation of control planes). Customers can tune the edge timers; however, they remain dependent on the provider's backbone to detect, hold down, withdrawal and prorogate the updates. This holds true for dual-carrier MPLS architectures as well, where customers rely on carrier A to withdraw the associated prefix(s) in a hard outage situation, so the disparate topologies can converge and restore the session path. It gets worse with a brown-out or regional outage, where carrier A would never withdraw the prefix(s), yet causes application degradation.
- There is no inherent data plane encryption. Some customers elect to implement over-the-top IPsec, which tends to impede the benefits of MPLS by decreasing overall scale, while adding an additional fault domain layer. Additionally, this requires distributed configuration steps for setup and key management.
- The customer's Layer 3 routing control plane is outsourced to the MPLS service provider, as customers are required to inject their remote site routing table into the SP's network, either statically or dynamically. At this point, the customer loses visibility with very limited access the provider edge, not to mention the backbone.
- Managing multi-homed default route selection in a single VRF requires the customer to provision site-of-origin (SOO) via a route map on the provider edge, with limited means to validate the configuration / implementation. This type of manually steering of traffic can take days, if not weeks to implement. The risk: outbound traffic destined to the closest exit point could suddenly transition to another multi-homed exit point causing latency and application lag.
- Most SPs prohibit SNMP access to the premise equipment for proactive alerting and instrumentation, limiting visibility into what is happening in the MPLS "underlay."
- Time to provision is typically elongated and unpredictable when compared to the consumer market. How is it possible for a consumer to provision 10 to 250Mbps service in a few days or weeks to their home, yet it takes a corporate network administrator typically 60-90 days to get similar bandwidth provisioned? This is the classic and rigid LEC problem, represented by the wholesale dependency retail service providers have when delivering services to the enterprise customers. The retailers are often dependent on the LECs outside of their own territory. This challenge becomes exacerbated when trying to procure 'diversity' for multiple circuits.
- There's no inherent application-based path selection to facilitate routing cloud-based application access via the local internet.
So if those are the WAN challenges today, what is the SDN promise?
In short, the SD-WAN can enable customers to take back control from service providers, while creating new market opportunities for those service providers.
If customers could create SD-WANs that separate the underlying transport from a software-based, overlay control plane on controller(s) owned by the customer, it would empower them, among other things, to centrally manage security policies and make
application-based routing decisions dynamically and based on application performance criteria -- all independent of the underlying transport.
The underlay just becomes a set of common IP circuits with next hop reachability. This opens the door for customers to go direct to the local market (LEC, MSOs, etc.) to procure more cost-effective bandwidth with the right mix of transport technologies and SLAs required for the business, without compromising or fragmenting the logical routing topology.
Consider a company that has business process outsourcing, business-to-business, internal, or other WAN constructs, which increase complexity and cost. What if a network administrator could build an underlying network with various transport providers and glue the transport together with a unified overlay providing centralized policy management via a controller to create logical segmentation for multi-tenancy? Essentially, this would drive up the efficiency rate, creating a more cost-effective network.
The benefits become exponential when you couple an SD-WAN strategy with converging and centralizing/regionalizing services such as SIP voice, IP Contact Center and other services, which are often distributed and reside on edge CPE today (DSPs, SRST).
Let's dive into how each of the previous problem statements gets addressed with SD-WAN.
- An SD-WAN overlay enables customers to regionalize their transport and go directly to the LEC markets to reduce the double margin effects inherent to the traditional national/global provider model.
- Ethernet services enable new commercial off-the-shelf CPE options.
- By separating the underlying transport and the control plane with software based overlay/controller(s) owned by the customer, it reduces the dependency and scale limitations with the carriers. Essentially, the carrier becomes "next hop reachability" via IP circuits, with an intelligent overlay managed by the customer to orchestrate the enterprise routing. The scale becomes directly proportional to the SDN controller.
- WANs today are not application-aware, nor do they consider the application performance thresholds. Soft failures/regional brown-outs can have unpredictable and adverse impact to real-time applications.
- Application or performance aware routing is a game-changing feature that enables customers to monitor the performance of the underlay and make real-time dynamic routing decisions by application. This dynamic detection and convergence capability will improve overall service quality by avoiding manual intervention and troubleshooting, while increasing a customer's probability to hit internal SLAs, as well as responding to unplanned carrier maintenance. This type of dynamic performance awareness has the potential to decrease the reliance on hop-by-hop QoS policy management as well.
- Many SD-WAN products come standard with data plane encryption and control plane security. Most enterprise security teams have stopped asking for internal WAN encryption due to cost, scale and manageability challenges. SD-WAN is an opportunity to provide a consistent authentication and transport encryption policy regardless of the underlying transport mechanism or service provider.
- Central policy management and segmentation now become a reality, and multi-tenancy increases the efficiencies of the underlying transport. "Who are you" and "what do you need access to" are based on user policy.
- A carrier-agnostic approach with full visibility and unification of the routing table, inclusive of a multi-home default route scenario.
- Services such as QoS and multicast will be inherent to the customer controlled overlay.
- Alerting and management will be innate, enabling the underlay performance visibility.
- Customers can leverage non-traditional transport for connectivity, such as cable MSOs, broadband and/or business-class internet, LTE and 4G to improve delivery time frames.
- Optimized path selection for cloud services, such as web conferencing, Office365, HR (workday) and other cloud-based apps via local internet links.
- Service chaining and NSFV become a reality through logical steering of traffic for load balancing and firewall services.
How close are we to realizing this nirvana vision?
The technology is very close, both from the traditional equipment suppliers and early stage start-ups. However, vendors are taking different approaches for prioritizing the features they will implement, and in developing their product roadmaps. Many of the
approaches will overlay Internet transport in the long run. The timing seems appropriate, especially given the maturation that has taken place with real-time services and codecs moving from narrow band to wide band, driving up the tolerance for Internet performance characteristics.
So, if an organization likes the sounds of SD-WANs, what kinds of questions should they be mulling to see if it is good option for them?
Are there remaining concerns or potential speed bumps that enterprise customers should consider?
The SD-WAN approach could also lead to carrier proliferation. How many carriers are too many? One side of the spectrum will suggest the more carriers, the better unit pricing. However, the resources required to manage a certain number of carriers may ultimately be unsustainable, hitting a diminishing marginal utility effect.
There is also the question of open versus closed. Many of these solutions will be shrink wrapped and closed alternatives, so if you desire openness and the desire to integrate multiple suppliers across a single overlay, you may need to wait.
It's clear that the time is now for enterprises to perform a market scan and develop a detailed set of problem statements to address. As a potential SD-WAN consumer here in the early stages of this emerging market, you have an opportunity to help guide development efforts and prioritization with the core suppliers.