Nate Baechtold, Enterprise Architect at EBSCO Information Services, says it was going to be too hard to automate the company’s VMware environment so the firm shifted to OpenStack, which natively abstracts underlying components much like AWS. But the next sticking point was how to enable developers to build in load balancing? A self-service model using the existing hardware-based system was too complex, Baechtold tells Network World Editor in Chief John Dix, but a new software-defined tool fit the bill.
Let’s start with a thumbnail description of your organization.
EBSCO Information Services is a discovery service provider for many things, including private journals, research databases, historical archives, medical reference databases, ebooks and corporate learning tools. Many of these are things you couldn’t find on the public Internet. So, universities and other organizations subscribe to our services and we are able to federate searches over all these databases to provide the information they are looking for. This past year we peaked at somewhere close to 400 million searches per day.
Does it appeal to certain vertical markets, say legal or healthcare, or is it any and all of them?
Any and all of them. A large amount of traffic comes from libraries and universities providing research services to students.
What does the technical environment look like?
We have a public cloud based in AWS and three private data centers, two that support our live application and one that primarily supports development resources. The majority of our live runtime apps are supported by a private cloud we built on top of OpenStack.
The main data centers are in Boston and Ipswich Mass. for redundancy sake and to create failure domains, and we have a large fiber link between them, but the idea is each data center is autonomous and can run without the other one. We have roughly 400 physical servers in each data center, and the majority of our workloads are virtualized, so we have 5,000-6,000 VMs. From a virtualization perspective, we’re using a combination of VMware and OpenStack, but we’re actually migrating everything over to OpenStack which is built on top of KVM.
How long have you been building the OpenStack environment?
We opened it up for development two years ago, and about a year ago we started using it for our live resources. Ever since then we’ve had a large percentage of developers using it for self-service provisioning, and that adapted into a model where we started automating provisioning, automating deployments, really trying to automate all of our infrastructure.
Why the shift to OpenStack?
Because it was going to be too hard to automate our VMware platform. When you look at a cloud platform like AWS, you go in and get a VM and it is automatically assigned an IP address and receives everything it needs to run from the cloud platform. You are insulated from a lot of the other underlying hardware implementation. VMware abstracts some elements of that, but ultimately you still need to know what data store to put on it, you need to name the network, maybe a VLAN identifier or something else that ties it to your infrastructure. There’s very little abstraction, and trying to build a fully automated model on top of that was going to be really difficult.
That’s why there are so many management platforms on top of VMware -- to insulate you from that API -- whereas OpenStack natively abstracts the underlying implementation. You create a consistent platform the same way you do in AWS, so you have an instance very analogous to Amazon’s EC2 (Elastic Compute Cloud), you have a volume in OpenStack that’s just like an EBS (Elastic Block Store) volume in AWS, you’ve got load balancing as a service, you’ve got images and many other things. They’re not API compatible, but they operate in a very similar way so it is easy to build infrastructure automation for your continuous integration/continuous deployment (CI/CD) pipeline.
We viewed the adoption of the OpenStack API as an easy onramp for getting full infrastructure automation and also getting integration with our CI/CD processes. Additionally, since it’s built to be a public cloud product, we didn’t have to fight with permissions. With vSphere you have to deal with permissions to folders, resource groups, and many other things. Instead of having to fight with individual permissions, we give developers a project. That project is logically separated from all the other infrastructure, like they’re operating their own private data center.
It makes it easy for them to write automation because they don’t need to worry about breaking things, bringing down the whole environment or affecting people on shared resources. It lowers the barrier to entry to write automation, to experiment and test. Those are really the core capabilities that OpenStack gave us, which is why we went with it. It was mostly focused around the API and accelerating our development efforts and accelerating our infrastructure automation efforts.
What percentage of your workloads are on OpenStack at this point?
Of our virtualized infrastructure, I would say around half. The goal is to migrate everything.
You mentioned you have some AWS cloud resources. Is part of the reason to go with OpenStack because it will make it easier to use AWS in a spillover capacity?
Due to data locality and a whole bunch of other problems, it isn’t easy to realize a hybrid cloud where you transparently migrate workloads back and forth. We, like probably most companies, are actively working to get into AWS and to get to the public cloud, but we realize we still need a private cloud to be able to serve our own internal data centers in the meantime.
Do you think long term you’ll be all-in with a public IaaS service, getting away from managing your own stuff?
Yeah. I would say that is our long-term goal. How long it takes to get there is another question, but that would be our long-term goal. Today we use AWS for BI processing and hosting some of our runtime services.
As I understand it, another thing you virtualized was your Application Delivery Controllers. What lead you down that path?
We created this private cloud where users could provision and tear down VMs to their heart’s content, and they did it very, very frequently. The level of change velocity in this environment is incredible. We’ve had over 420,000 VMs created and destroyed in the past two years.
But really a cloud isn’t useful until you’ve given your development and operations teams the ability to self-service all the capabilities they need to build their live applications. Out of the box they can build VMs. That’s great. However, they couldn’t hook them up to load balancing or many of the other things they needed. Load balancing was the number one pain point because you couldn’t build a highly available application without some semblance of load balancing.
So first we tried to create a self-service model on top of our existing hardware-based load balancing system, where we could enable teams to provision new content rules, new virtual IPs, everything they need to build and manage their applications. But it was surprisingly hard to do. To create a system that could be fully automated was almost impossible on our existing solution.
Did your hardware ADC provider offer a software version of their appliance?
Yes, they did. They offered a VM version but all it did was shift the problem. It didn’t solve the problem. The only way it helped us was to say, “Okay teams, now you configure and manage your own virtual load balancers.” They weren’t too happy with that because it added complexity.
It wouldn’t have been very efficient to take this problem that was solved before by a dedicated load balancing team and shift it so that now everyone had to become subject matter experts on a specific load balancing technology. So we looked into tapping into load balancer as a service on OpenStack and pointing it to our existing vendor, and that didn’t work out very well. The driver wasn’t very mature at that point in time and it wound up causing all sorts of problems. That’s what caused us to start looking for alternatives.
Can you give us some perspective in terms of what the load balancers were being asked to do?
We had a very SOA-heavy architecture. We probably had around 80 or so services in our mid- and back-tiers communicating with each other, so the edge, the front tier, was a small portion of what the load balancers were doing in this environment.
And what solved the problem for you?
We wound up seeing a company called Avi Networks at the OpenStack Summit and they had some really interesting demos. The attraction was multifold:
* First, from an access perspective and API perspective, they aligned perfectly with OpenStack’s multitenancy system. What they do is view a load balancer as a project, a tenant, just like OpenStack creates a project and a tenant, and that represents your view of the world. You can only see things in your tenant, you can only affect things in your tenant. If I give you a logical slice of Avi through a tenant, just like through OpenStack, you can only break things in your own world. It makes it easy to hand load balancing responsibilities off to different teams. We give you access to your view of the load balancer and you can perform all of the functions you need to build and manage your applications from the ground up automatically. That was really cool.
* The second thing, which wound up appealing to us even more, was the insight and analytics engine that came with it. We used to get some very raw metrics from a load balancer, but the analytics we get out of Avi are extremely valuable; things like better end-to-end performance results and automatic anomaly detection and tracking. And something that wound up being very useful was significant event detection. It logs what it sees as significant events and we’ve used that to find network issues that weren’t detected before.
Our development operations teams wound up liking that element probably the most out of all because now they’ve got all this visibility, all this insight into application performance they didn’t have before. It created a strong desire to migrate over to the product.
How did Avi address the need for simplicity, the problem you were having with the other product?
The setup and usage of Avi was straightforward. It literally took us 20 minutes to get a highly available instance deployed, configured and integrated into our OpenStack cloud, which was awesome.
From a user perspective, the interface is very intuitive and easy to use. There aren’t any superfluous options, and if there are they are cordoned off into their own little bounded context area; network settings, for example, are in a network profile section and, unless you care about that, you don’t need to deal with it or know it exists. You just take whatever the standard is. We were able to point dev teams at it and people with no load balancing experience were able to quickly create highly available load balanced environments.
Where we used to have a centralized network team do all of our load balancing functions, writing custom rules trying to distill them down for other people to use, now we’re able to distribute these functions to all the operations team because they are so much simpler.
How is it deployed?
We point it at our OpenStack cloud and it integrates with it. It integrates with the projects, aligns with its multitenancy model, and provisions load balancers on the OpenStack cloud to use. They’re called the service engines. It automatically scales up and scales down the service engines based on demand. From our perspective, pretty much we carved out an OpenStack project, we told Avi to put load balancer VMs here and it autoscales them in and out as it sees fit.
Were you concerned at all about a potential performance hit, shifting from a hardware to a software-based product?
Initially we were concerned, but so far every single performance test we’ve done, and every single live application we converted, hasn’t shown any performance hit. In fact, in some cases we wound up getting better performance due to the insight and analytics engine pointing out inefficiencies that we had not noticed before.
Did you justify the migration on the promised ease of use, or was there a cost factor as well?
I would say the ease of use. The integration with our strategy, with our private cloud, were the real drivers, but there was a cost-saving element to it as well. It wound up being considerably cheaper than our existing solution because it didn’t rely on proprietary hardware, we are just paying for the software, and it is scaling on the same x86 virtualization platform all of our systems are running on.
Any hiccups along the way in terms of implementation or lessons learned?
There are always hiccups. In converting one of our applications over we found one of the performance settings we had set wound up being inefficient for the type of application, and it was sending very large quantities of HTTP post data to this service and we didn’t know it. It wound up being an application where we saw performance increase once we tuned the TCP Windows scaling settings.
It sounds like the product has worked out well for you.
It has. We’ve gotten to the point where now we’re using it to do blue-green deployments of our applications to achieve full infrastructure automation. As part of a software release we’ll spin up an entire new farm of servers, hook it up to our load balancer, validate it independently, and just switch the load balancer to feed traffic to the new software in one atomic action. We’re automatically standing up new environments, virtual services and load balancer rules through complete automation, and we still get the visibility required. It’s been one of the more successful things at our company.