The battle of the reboot

Patching has become routine, but patches don’t take without a reboot. That’s a problem when business units insist on zero downtime.

I’ve talked a lot about patching, ever since my first column for Computerworld way back in 2008. Then, I had to struggle with the IT department to get them to do any patching. The backlog was immense — not only had the Windows operating systems running on our servers never been patched, but neither had their software applications. My vulnerability scanner found literally hundreds of thousands of patchable vulnerabilities.

It took a lot of work over a couple of years to work down that backlog and get everything current. And then we had to start on workstations. After all that, we finally reached a stable baseline with few vulnerabilities, but the next challenge was to start patching on a regular basis and keep all our computers updated month-to-month.

Fast forward to today, and everything is now on an even keel, as I mentioned in a recent column. Finally, patching has become a routine system administration practice no more difficult than log file management, end-user account administration and other relatively painless procedures. We even have special patching software that automates many of the system administration tasks required to deploy patches and software updates to all our computers.

There’s just one problem remaining: Windows computers must be rebooted to complete the patch installations. And because a reboot takes the computer out of service for a few minutes, it causes downtime. And when that system is dependent on other systems, or vice versa, rebooting can cause a chain reaction that cripples critical software services. So in fact, the simple act of rebooting a computer to complete the patch installations is the hardest part of the job.

This problem came to my attention in a recent vulnerability management meeting. I have these meetings every week with my company’s IT department, to go over the latest vulnerability scan results with them, plan next steps and make sure nothing gets missed. As we reviewed the scan results, it became apparent that several servers had fallen a couple of months behind on patches. This came as a surprise, because as I said, patching has become routine.

When I asked why those systems weren’t getting patched, one of the system administrators said the patches had in fact been installed, but the systems hadn’t been rebooted. From the system administrator’s point of view, his work was done. He applied the patches and figured the last step — the reboot — wasn’t such a big deal. From my point of view, though, the vulnerabilities still exist because the patches haven’t been installed.

But it wasn’t as simple as me asking him to reboot those computers. He hadn’t been given permission to do so. “The business won’t let me bring down the application right now,” he said. “They have a big deadline coming up, and they don’t want any downtime.”

“Are you kidding me?” was my response. “Surely they can tolerate a five-minute outage in the middle of the night when nobody is working.” But I found out later, when I called the business unit manager, that they were running overnight processes that would be corrupted by stopping the services.

So I tried to find a time when everybody could agree to do the reboots. Unfortunately, we haven’t come to that agreement yet. After the business unit’s deadline has passed, we should be able to resolve this. But my main concern is not with this particular situation; it’s with the general challenge of business units requiring 100% uptime on computers that need to be rebooted at least once a month. This is going to take some negotiation and planning.

For the moment, though, I’m going to have to live with some accumulated vulnerabilities. I could take a hard line and insist on rebooting the servers, but knowing that that would compromise the business unit’s work, I’ve decided to be flexible. We need to find a solution to the overall problem of regular system rebooting (and other system administration tasks) in a mutually agreed “maintenance window” where IT can take over all the computers for a while every month.

But secretly, I’m hoping for a power outage.

This week's journal is written by a real security manager, "J.F. Rice," whose name and employer have been disguised for obvious reasons. Contact him at jf.rice@engineer.com.

Join in

Click here for more security articles.

Join the TechWorld newsletter!

Error: Please check your email address.

More about Click

Show Comments
[]