I think it happened to every vSphere admin at some point… the huge VM you’re so proud of reported that it needs to consolidate a disk (of course it’s the large data disk) or the automatic snapshot of that backup job was not automatically removed (hello Veeam folks!). Usually removing snapshots or consolidating disks is no big deal, vCenter handles the process automatically in the background. I throw both processes in the same basket, as they’re bascially the same thing.
Sometimes however, things go wrong (before we get to the consolidation/removal processes); a disk is corrupted, one or multiple, usually large to extra large snapshots orphaned and you need to clean upIn my experience the IT-devil will preferrably hit the VMs with heavy, critical loads and disk sizes in the 2-digit TB range… The operating system crashed or the VM crashed or something else on the “no bueno” list happened. One of the things vCenter keeps screaming about is “Virtual machine disk consolidation is needed”. Or some peripheral application like Veeam ran into issues because of the orphaned snapshot, there are a multitude of reasons you may get into this situation.
So before you go on a google frenzy to find the magic button in this matter, let me shed some light on this scenario and what you can do:
How to clean up the situation as quickly as possible? Snapshot removal and disk consolidation can usually be run in any VM powerstate. If the VM is powered on, it will take longer, as OS disk I/O traffic will constantly interfere with the removal/consolidation task. So if you want it to run quicker, first power down the VM.
Don’t expect miracles, the process itself is rather… slow and the time it’ll take is completely dependent on the VM disk size, snapshot size, to a lesser extent on hardware (ESXi server and storage). As a very rough estimate, expect the process to run for about one hour per TB disk capacity. At one point I had to run consolidation on a VM with a 49TB disk and yes, it took a bit more than two days to finish.
Speed it up!
You started the consolidation/snapshot removal process and the manager is breathing down your neck, demanding the system to be up and running again asap?
You: “There must be a hidden turbo switch for this kind of task, right? Some whay to speed it up?”
Me: I’m sorry, but no there isn’t.
You: “But the manager wants the VM running NOW, so can we stop the consolidation process, start the VM and just deal with the cleanup later?”
Me: once the consolidation process is started, it cannot be stopped, there is no specific control tool for this. Theoretically you can kill/restart the management services but the chances are very high, this will leave the disk in a corrupted state – fixing that will be an enormous pain in the backside, not to speak of the manager…. have patience, it is done when it’s done.