LUSH Fresh Handmade Cosmetics of North America is growing like a weed and its storage infrastructure wasn’t keeping up, so network and security team manager Dale Hobbs went looking for an answer. He shared the story with Network World Editor in Chief John Dix.
Let’s start with a thumbnail description of your company?
We were originally founded in the UK and I work for the North American business which owns the rights for anything to do with LUSH in North America. We answer to the board of directors so we are closely tied to LUSH in the UK and they have a lot of input into what we do, and the company has the same look and feel from a customer standpoint in both markets.
We have about 240 stores here and for the past couple of years we had been opening about 40 stores a year, but we’re slowing down to about 20 or so per year now. Our head office for North America is in Vancouver, where we have a manufacturing and distribution warehouse that supplies the stores in the central United States and to the west. And in Toronto we have another manufacturing plant and a distribution warehouse for the east.
So that is all independent of the UK, but is IT independent too?
Yes, our IT departments are completely separate. There is no integration. We do our thing, they do theirs.
What does your computing infrastructure consist of?
Today we operate out of a colocation facility, where we supply all of the equipment and they just provide rack space and power and data connectivity. We have a VMware infrastructure. Pretty much everything is virtualized and we have a large number of servers in there, predominately Windows servers. There are a few Linux servers as well, but most are Windows because we run the standard Microsoft Suite of applications. But the servers also support our point of sale systems and our ERP platform.
Which brings us around to Nimble. What problem were you having that led you to install their storage array?
The biggest problem was performance. Everything was just slow. You’d boot up a machine and it would take four or five minutes. Email was slow and sluggish and people were starting to complain. It wasn’t always that way. We didn’t grow the infrastructure as fast as the rest of the business grew. For example, when we moved into that data center about two summers ago we had 10-15 servers and only about 5 terabytes of data. Today we have over 150 servers and more than 60 terabytes of data. It was just getting to the point where IT was more of a hindrance than an enabler.
What was happening was our storage couldn’t keep up. We were trying to squeeze 7,000 I/Os out of an iSCSI storage array that was only capable of about 4,000. When we first put that in we thought it was the greatest thing since sliced bread. It was quick and fast. But as the company started to grow rapidly we found it couldn’t keep up with our needs.
Was it a question of needing more boxes or...
No, we actually did put in a second one hoping that was going to help performance, and it did to some extent because we were able to share the load over the two devices, but at the end of the day both those devices were running at 100% and we still needed more out of them and they just couldn’t keep up.
So you went looking for an answer to that. What kind of solutions did you consider?
We reached out to a local vendor here in Vancouver, Long View Systems, and they came back and went through the technologies offered by five or six different vendors -- what they offered, the pros and cons, and so forth – and ultimately they recommended sticking with the iSCSI infrastructure and going with a Nimble array. So we decided to try them out.
Why did Nimble appeal to you?
The biggest thing I liked was how their CASL (Cache Accelerated Sequential Layout) architecture worked with caching. The other SAN vendors would use flash storage and 15,000 RPM spinning disks, whereas the Nimble array uses a combination of flash disk and drives that only spin at 7,200 RPMs, which are cheaper, and we get the same or better performance.
Did you have any reservations about the whole flash idea?
I didn’t have reservations about the flash, but we did initially have reservations about the 7,200 RPM drives. It was a bit of a leap of faith because traditionally, when you’re thinking you want data faster, you’re thinking the faster the disks spin the faster you’re going to get your data. With Nimble, it writes data sequentially and uses the flash for cache so we get better read/write performance with the 7,200 RPM drives than we did with our old system which had 15,000 RPM disks.
Nimble doesn’t need the higher speed drives because the majority of the stuff is served out of the cache, and it's intelligent enough to cache what you’re probably going to need. So, it intuitively figures out, “You’re asking me for an apple, but you might also ask me for an orange and a banana so I’m also going to put them in the cache just in case.”
How did it work out?
It absolutely floored us, to be honest with you. It completely blew us away. When we first put in the device we were in the development phase of a new ERP platform and one of the biggest complaints for the team rolling that out was the software compiles were taking 12, 13, 14 hours to run. So that was one of the first systems we moved over to the Nimble array and we used that as our benchmark. The first time they ran that compile it took 24 minutes.
Wow. And that’s simply because all the critical components are cached in flash?
Yeah. We thought maybe something was wrong because it was such a drastic jump. So we had them run it again and then had a couple other people look at it and everyone kept coming back with the same results. Now it's 25 minutes, 24 minutes, 23 minutes depending on how busy things are on the network. When we had four or five different people giving us the same answer, we were pretty confident this was a good decision.
How did you handle the cutover?
We installed the Nimble array in the rack in the data center with all the rest of our servers and the old storage we were using and just migrated some of our virtual machines off of the old storage onto the new storage system. Our VMware cluster was able to see the storage and start utilizing it and we just moved systems over live. We didn’t have to shut them off because vMotion happens with VMware in the background.
So it started off as a performance issue, but obviously you had to consider capacity as well.
Due to the way Nimble compresses data, we were able to fit about 40 terabytes of data into roughly 22 terabytes of physical disk. We’ve already outgrown that and added a storage expansion shelf in the last couple of months.
Did you unplug your older storage system or are you just adding Nimble to the mix?
The Nimble basically replaced it. We moved everything off of the old storage array onto the Nimble Storage array and once everything was moved off we retired the old equipment. We moved it over into our backup environment. We’re still utilizing it, but just for less resource intensive tasks.
How does Nimble scale?
Quite easily actually. The brains of the unit are on controller cards and everything is modular. It’s like Legos. If you need to upgrade from a 1 gigabyte network to 10 gigabyte network, you just unplug the passive array, unplug the network card, replace it with a 10 gigabyte card and just do a manual failover and everything moves over to the new network and then you just do the same thing on the other one. The same thing with capacity. If you need more capacity you buy an additional expansion shelf and it’s just two cables that plug into the main unit and as soon as it’s online, you activate it and that’s it.
If you need to add more performance or if you need more cache, you buy a cache shelf and it’s the same thing. You just plug it in and as soon as it comes online the unit sees the device is there, you activate the device and it’s in production. That’s it. It’s just that simple. It’s the equivalent of driving down the highway and replacing your transmission without stopping your car.
How were you justifying the investment?
The performance gain and the modularity -- the ability to expand the system easily -- was enough for us.
How did it compare pricewise with the older system?
The older system was a lot cheaper but didn’t perform anywhere close to how the Nimble performs.
Was there any learning curve in terms of implementation or operation?
No, almost nothing. To get the previous arrays up and running probably took four to five hours worth of effort, and then figuring out how to use the device was another day or so. To get the Nimble online, they came onsite, they did the install for us, they did the initial configuration in maybe an hour and a half, and we had our first test machine moved over within about three or four hours. Once we were satisfied everything was okay we started moving things over pretty much rapid-fire. In about three or four days we moved our entire infrastructure from our old system to the new one and never looked back.
There’s very little we have to do there. The only time we really have to touch it is if we need to build a new server and need to carve out some disk space. Aside from that, we don’t really have to monitor it. Any time there has been any issue, we got an email from Nimble support saying, “Hey, we found x, y or z problem and it’s already fixed.”
They’re proud of that proactive support capability and talk about it as being a key differentiator, and it actually works?
It does, absolutely. I can’t think of a single instance where I have actually had to call them and say, “Hey, we found this issue on our array.” Usually it’s the other way around, them saying, “Hey, we found this on the array but don’t worry, we’ve already fixed it for you.”
That’s pretty snazzy. Any surprises at all?
The only surprise is how easy it is to work with. It really is. I’ll go back to the initial apprehension about using the slower drives. That was the biggest leap of faith we had to take and it’s also been one of the key points whenever we’re talking to anybody about it. When I’m talking with people about storage arrays and they say “We use these drives and here’s the performance we get,” I’m like, “I get double your performance and I only use these drives.” They don’t believe it. They think I’m lying.
Were you worried about dealing with a smaller company versus one of the big industry stalwarts?
That was a little bit of a concern initially, but we trusted our vendor to give us the best information they could and to provide something that was reputable and that was going to be a good long-term solution.