Many VMware customers Tuesday were prevented from logging onto their virtual servers as a bug distributed in a software update effectively stopped the boxes from powering up.
According to VMware, the issue involves ESX 3.5 Update 2 and ESXi 3.5 and customers powering on virtual machines (VM) that have been upgraded with those releases. In a statement, VMware said it is "working on an immediate patch for customers in production. VMware expects to fix the issue in code in the next 36 hours once QA testing has been completed."
The company says the date bug only affects customers that had updated their systems with the July 27 releases of ESX 3.5 Update 2 and ESX1 3.5, but VMware has not specified exactly how many customers that could be. VMware is sure to take a publicity hit with the news of a bug that slipped through its fingers, industry watchers say.
"This certainly appears to be the most publicized bug for VMware so far, and I think it is damaging to VMware and virtualization as a whole. The hypervisor is the lowest software level on the server and if you have an issue like this, boom, all your infrastructure is down," says Gary Chen, a senior analyst with Yankee Group. "Software will always have bugs, but a widespread issue like this that affects all VMs is really damaging, especially at this point in time where virtualization is starting to take off. VMware is going to have to fix this fast, provide an explanation, and outline what they will do to strengthen their QA in the future."
Customers around the world have been affected and sharing their experiences in VMware's forum. One customer wrote: "We've just encountered a serious bug with our ESX cluster -- serious enough that I thought I should post about it here as a prior warning for others running ESX 3.5 Update 2." The customer goes on to explain messages received from the VM, which in essence state that the product has expired.
According to Chen, the bug prevents customers from powering on a VM, but it doesn't seem to affect VMs already running. A workaround that seems to be effective for now, Chen says, involves setting the date back, powering on the VM and then resetting the date. That may solve the problem in the moment, but Chen says customers may be wary of supporting a homogenous virtual infrastructure going forward.
"As enterprises move towards a fully virtualized infrastructure, issues like this certainly will make people think about adopting multiple hypervisors and not putting all your eggs in one basket," Chen says. "If you are 100% virtualized using a single vendor, one software bug or an exploitable security flaw in the hypervisor could instantly freeze your entire infrastructure. These are the risks you take if you have a monoculture; we've seen it before with things like Windows, IE, etc."
For VMware customer Jake Seitz, enterprise architect at The First American Corp., in Santa Ana, Calif., this bug didn't cause any problems in part because he had not upgraded his systems yet and in part because VMware contacted him Monday to alert him on how to avoid a problem by not powering down VMs with the update.
"We were proactively notified by VMware Monday. They told us these are the symptoms and what would happen if you powered down your virtual machines," he explains. "They gave us the general prescription in terms of troubleshooting and things to avoid such as powering down."
While this bug didn't hit his environment, Seitz says it sounded like one of the worse ones to come out of VMware thus far.
"I would consider this a very severe bug, just by the nature of it. It sounded worse than previous bugs," Seitz says. Normally they are on top of their game so I am surprised that they missed this one, that it made it through."