Tribune Media rebuilds IT from the ground up and is living the dream
- 07 January, 2016 23:13
Tribune Media CIO David Giambruno got the rare opportunity to build IT for a $2b company from the ground up after the Tribune Company was split in two in 2014. The split created Tribune Publishing and Tribune Media, the latter being the largest independent broadcasting company with 42 TV stations, a movie studio and divisions involved in everything from sports metadata to real estate. Giambruno lives in Raleigh, North Carolina, and has offices in Raleigh, New York and Chicago. Network World Editor in Chief John Dix recently talked to Giambruno about what is possible when you’re given a blank sheet of paper. Perhaps not surprisingly, the answer resembles the fabled Software Defined Data Center.
It must have been a monumental job to divvy up a company while it was still in flight, so to speak.
When you’re splitting a company you have to keep it running, so there was a decision made that Publishing was going to keep all of the legacy [gear] and we would have a transition services agreement to support us in the interim. Then essentially I got to build greenfield. I had to consolidate all our “stuff” from 54 data centers into something new. There are very few times in your career you’re totally unencumbered, so I looked around and said, “What can we do?”
We knew we had to build the traditional backend to support the services side of IT, so DNS, email, all of that stuff no one ever sees. There’s no cosmic joy to backend systems. Everybody just expects them to work, like electricity. To visualize the job of splitting the company I used a grilled cheese sandwich metaphor: It looks nice and neat, but when you cut it and pull it apart you get all the gooey, messy stuff. Every corner cutting, field-expedient fix from the last 30 years comes back to haunt you. The only downside of the metaphor was getting the picture for the PowerPoint. My 8 year old daughter and I had to make five grilled cheese sandwiches to get the right shot, and I had to eat them all.
But the fun was figuring out how we were going to build that next generation platform. How were we going to keep costs down, how were we going to automate it and avoid boxing ourselves in? Technically that meant building a platform that could adapt, consume, scale, eject and execute with predictable precision using cloud, containers, XaaS and whatever Silicon Valley throws out for the next several years. Financially it meant disconnecting capability and cost. If I want to add five things I don’t want to have to pay five bucks, I want to pay two bucks.
So we decided to build a private cloud since we only had five months to split everything. The first target was to get to 90% virtualized from 60% so we could move everything. And one of the first things we did was a bakeoff between VMware and OpenStack. We had 26 people doing OpenStack (because it was cool) and four people working on VMware, and the goal was to get 1,000 servers running inside of a month. At the end of the first week the VMware team was done and dusted. At the end of the month the OpenStack people still had nothing. Choice made.
So we started the process of lighting up the VMware and migrating our applications. Essentially we were running two horses. One being the infrastructure and all of their services, and second moving all of the applications. The team building the entire infrastructure backend was only nine people. That was it. I’m incredibly proud of them.
I presume you were migrating to an x86 hardware environment?
All x86. No Mainframes, AS400s, etc. I have Wintel platforms and only have five physical servers that don’t run virtualized. Otherwise we’re running roughly 1,200 servers on 79 physical hosts.
When we were getting ready for the network one vendor quoted me $1.5 million to do my core. We ended up going with NSX from VMware and $70,000 worth of Juniper in the core because, with NSX, I can use much less expensive stackable hardware since much of the intelligence for logical redundancy is sitting in software. Juniper also has the best XML parser.
And if I understand it right, one of the reasons you could consolidate so much of your compute resources was because you offloaded some workloads to the cloud?
Right. I don’t have PeopleSoft Financials or PeopleSoft HR anymore. Those are 800-pound gorillas and we replaced them with Workday Financials and Workday HR, which are SaaS services, and Anaplan for budgeting and FPA [Financial Planning Analysis]. So that huge erg of horsepower that took up a chunk of the data center is no longer onsite. In raw numbers, about 80% of our applications are still on-prem and 20% are in the cloud, but in a compute sense we’re about 50%/50%.
Literally, this is like a $2 billion startup, or re-start. And we did the whole thing in five months. Management’s assignment to me was to build a “Frictionless enterprise,” which is a very succinct and clear goal. The power in that vision is the focus it enables me to give my team.
People say, “Wow. You did all this?” But it is like Captain Kirk and the Star Trek’s Kobayashi Maru test. I say, “Yeah, but I cheated.” It’s just what is possible now. I did not have the baggage from a legacy enterprise and I had clear purpose of mission. The outcome was like going from the Flintstones to the Jetsons.
We set up the environment, got everything running and ready by the end of May, 2014. We went live August 4, moved all apps, and collapsed 54 data centers onto seven racks with nine help desk calls. It’s one of these funny visuals because we’re a pretty big company. You walk into my data center and expect to see rows and rows of stuff; it’s literally seven racks. I have to take people in and show them, saying, “Really, that’s it … My data center that got caught in the dryer.”
The magic to me of an internal cloud is all of my data is in a single place. All the other benefits pale compared to the ability to have all my data in one place and to copy it at will anywhere, giving me what I call indiscriminate compute. So as we put in our API layer it makes it really easy to move information in and out, to control that information. We’re still going through the whole micro-segmentation piece, but the ability to wrap our data with a common security profile and push that out externally, it changes the operating metaphors.
I use the term indiscriminate compute, but it is really compute, storage and the network -- being able to move and extend that anywhere the business needs while knowing where it is, what it’s doing and who has access, so it still stands up to an audit.
If I want to take my internal servers and go to AWS or Azure or some other provider down the road, we’ll be able to do that. We’ve already pushed some stuff to AWS as a test. We just don’t have a need right now for public cloud because we have capacity and because of latency problems in public clouds. That’s quickly going away but it’s still there. I always joke, bandwidth is cheap, but latency is priceless.
Going back to NSX, did you discover anything along the way that NSX couldn’t do?
In the beginning the hardest part was not the technology, it was the ecosystem. It’s fairly young, and the hardest part was getting other vendors to offer virtual instances of their physical hardware. Everybody talks about virtual appliances, but it’s a big shift. “I’ve been creating physical boxes forever and now you just want me to give you a piece of software?” It is a total frame of reference shift for the suppliers and their business and revenue models.
We even had problems just getting SKUs. So you really have to work on the ecosystem. That was really the only frustration. The technology itself worked. But we got to walk into it. We weren’t lighting up 50,000 nodes. We slowly lit stuff up, we learned, we got better at it. I highly recommend the crawl, walk, run approach … but it is eminently doable. You do need the right people. I am blessed with an awesome team that loves the challenge and has that one key quality: curiosity. Curiosity has to be fostered, and that comes down to leadership and supporting your team.
We’re all about simplicity. I call it the Southwest of computing. Southwest Airlines uses one type of airplane, so they have one set of mechanics and one set of parts. So what I strive for is, get really good at a set of technology, own it and wield it and get the most out of it we possibly can. Wield the technology. I don’t worry about vendor lock-in because my threat is binary. What I mean by that is, if you make us really, really mad, we’ll just take everything and rip it out and replace it. We work very hard at getting really good relationships with our vendors. But if it goes bad it’s divorce court. You’re not going to lose 10% of my business; it’s all gone.
So that’s the way I approach it because I think simplicity wins in the long run. Workday is similar in that everybody runs on the same version. It’s like an apartment building where you get the same floor plan. You can change your paint and your sink but pretty much everything else is the same. We just went through a Workday 25 upgrade and I’m used to SAP and Oracle upgrades that take months of prep, lots of money and lots of consultants. This was a team of eight, two weeks, a four-hour upgrade, you’re done. You go, “Wow. That was easy.”
The same metaphor applies to infrastructure now. We’re still bound by applications, but it is really about how you wield the tech. It is how do you disconnect the cost and become scalable and run those things that need to be run in the background without anyone getting sticker shock from cost or effort at every change. The best compliment I get from the management team is they don’t have to think about me.
Give us some perspective on how you’re benefiting here in the new world.
Before we split the company in half we had 585-ish people in IT, so you would assume I would end up with 200 to 300 people. I am running everything, infrastructure, apps and support and development, with 43 people. Even better, I didn’t have to fire anyone. Only 25 people transitioned from the combined company to Tribune Media.
But the thing I look at most is business alignment, and I look at it really simply: It is your ability to do more with less. I am not revenue generating, but it doesn’t mean we can’t be innovative and engaging. My team uses technology to give the business a competitive advantage: speed. The more I can focus my team on delivering projects, the better the business is.
I still need the “bump in the night, worst case scenario team,” but the paradigm shift keeps reducing operational risk and enabling my team to work on business projects. This is quantified in the number of projects we deliver. The new capabilities enable us to get the infrastructure out of the way so we can do more. In 2014 I believe we got a little over 140 projects done. Year-to-date we’ve crushed it with over 245. And these are massive projects. We’ve built an entire backend for a company. We’ve built shared services to big data, we’ve built all this stuff. We’ve taken the infrastructure out of the way so everything becomes easier. One of the best examples is the speed at which we deployed Workday Financials. We had the fastest go live ever for a company our size.
What you see is people getting more done faster and that changes people’s frame of reference about what can be done and how long it takes. One of our team’s rallying cries is, “Everything begins and ends with an IP Address.” That combines with my team’s pension for automation.
Mike Cannella, one of our Cloud Engineers, worked with InfoBlox and VMware to create an awesome integration. Now it’s literally a click of a button to automate VM provisioning, birth to death. Click, and a VM gets an IP address that’s entered in the DNS (depending on its naming standard, so, production, dev, test), it gets a life span, and it gets an owner. If it’s a test box, the owner starts getting nagged at 75 days to see if they still need it, and if they don’t respond it just gets deleted at 90 days. This is where the operating metaphor of a Software Defined Data Center shines.
How are you handling storage?
Storage is in major flux. One of my Cloud Engineers is really awesome in storage, Ben Gent, and he gave me this PowerPoint titled, “How do I fire myself?” That obviously took a certain degree of courage, but reflects the new operating possibilities in storage.
He goes, “Here’s what I want to do. I want to complete the virtualization stack with VSAN and set up a storage sandwich. Pure Storage (flash) on top to address high performance needs, VSAN on cheap servers in the middle to support the bulk of our needs, and Cohesity on the bottom for backups, replication, de-duplication and recovery, and then we’ll automate the whole thing so the help desk can provision storage and give you millions back in savings over the next three years.”
Can’t really argue with that. I asked him how long it will take and it turns out he already had it running in the lab. So we are moving to a model where 25-30% of our capacity runs on flash, the lower 70% runs in VSAN, and Cohesity, which is coming online soon, does the deduplication and disaster recovery.
The team has set up auto-tiering to move applications to get the storage performance they need. The VSan is running on commodity servers with JBOD with the software managing everything, and we’re getting 2-2.4 MS latency with Cohesity doing backups at 35% compression.
The outcome is we are going to affect long term operational costs while increasing capability. One other example: We are using this Software Defined Data Center architecture to move data centers. My team is doing it with four people, only one external. Think about that. So its all about wielding technology, changing the frame of reference of what can be done and how one uses the word can. “Why can’t I?” vs. “I can’t.”
How have you organized the team?
You have to have an organizational structure, but no one lives in a silo. The hardest part is taking most of the vertical structures and making them horizontal. I would argue I run a DevOps-type world. If we need to run a project the project team collaborates with engineering, engineering talks to operations, operations talks to support. It’s just a constant cycle. The engineers are third-level support. So if they build something crappy they’re up late at night dealing with it. That only happened a couple of times.
Value isn’t building something every day. My engineers aren’t about building; they’re about wielding technology. It’s more about how are we going to leverage this capability and what can we do with this? Because it’s just compute, storage and networking.
You mentioned using NSX for micro-segmentation, can you expand on that?
Having to put physical firewalls in between everything is just not scalable. I’ve got 1,200 virtual servers. If those were physical, I’d have another seven racks just for the firewalls. Now I don’t have to. And with NSX I can have a security policy and use that to wrap a server, an application, a piece of data, whatever I define, and that’s really powerful. We’re also using NSX APIs to integrate Cyphort, so we have them looking for advanced persistent threats between all my east-west traffic. I’ve never been able to think about doing that before.
And going back to the indiscriminate computing idea, the ability to move from a private cloud to a public cloud, if you have an NSX capability you can ensure security is moving with it because everything is now a file and you “wrap” that file with a security posture.
Are your physical servers in one location and supporting all your different locations?
Yeah, but obviously I have a DR site and I replicate everything. We use Riverbed’s Granite which lets us project a LUN. We take our applications and push them out to the edge so the user feels like they’re onsite, only the changes come back, and if we lose connectivity it still runs. So we do that as much as possible for data consistency. We had a user delete everything, totally bombed an app, and we reloaded the thing in five minutes. It works like a champ.
Beside my idea around indiscriminate computing I have another one: I don’t want data ever leaving my data center. I want to project my data. I want to let you, depending on your rights, interact with it, but I don’t want that data floating around on devices. That really reduces our security surface area.
You haven’t mention container technology, which seems to be an increasingly popular tech. Do you have any container plans?
A VM is one operating system to one application, while a container is one operating system to many applications, so you can achieve greater density with your hardware. We’ve done some tests where you get up to 80-100 to one, but of course it depends on the workloads. I think there’s a place for it but we’re kind of waiting. VMware is working on it. And we’re still in this tactical curve of rebuilding a $2 billion enterprise.
We’ve launched everything. We’ve gone live with our app portfolio. We’re operationalizing all that and we’re going in now to optimize it. In mid-2016 we’ll step back and take another look at technology and containers are on our roadmap.
Any regrets starting from scratch vs. pushing to keep some legacy infrastructure?
My team got to do a greenfield, and it was a brutal pace. Last year was the hardest that I have ever worked. But we came out of it with a next generation operating platform (infrastructure application) so the business can face forward and think about increasing shareholder value.
From an IT perspective, teams tend to either spiral up or spiral down. My team is in this spiral up mode. For example, vSphere 6.2 just came out and my team is excited about what they can do with the technology. Some other shops will be like, “Oh my God, we have to do this upgrade,” and spiral down.
It’s having a team that’s excited about technology and what it can do to help the business. We’ve tracked a couple of engineers to see how much they can do and you can see just how much this technology empowers people. It’s very different because historically the server guy is the server guy, the network guy is the network guy, etc. I don’t have server or network or storage people. I have cloud engineers. Because all of that works as a system. People still have their natural affinities, what they’re better at, but to all of them it’s just bits. It’s interesting to watch how they work together and how they communicate and how much they can get done because there aren’t those artificial silos anymore.
It sounds like you guys are having a blast.
I promise my team: one, you’ll never be bored; two, you’ll never have as much fun, because I don’t do technology for the sake of technology, it has to have an outcome; and three, we will develop a world class team where we make people great and where they want to come to work every day.
We live by five operating principles founded on this one fundamental idea: Our job is to create a Frictionless Enterprise:
- We are here to help. Period.
- Enable the business to turn around and face forward.
- Make systems work for people, rather than people working for systems.
- Command technology to provide competitive advantage for the business.
- Provide management teams actionable information to make good decisions.