technology

Recovering Proxmox VM from failed HDD

Due to previous failure of SSD drive from Goodram I was forced to use brand new 1TB HDD from Toshiba. It was not a problem because the system running on it mainly have been using writes with not too much reads. My SSD drive had some performance drops which could be because of the fact being run out of the same power socket shared with some DIY tools in garage. Now there is no power socket sharing I think that I may close server lid with too much force, so even brand new HDD failed.

Proxmox reported failure of disk access directly on the virtual machine:

Drive disappeard from the server. I remount it and rebooted still with no avail. I cleaned connections a little bit. Blow on vent hole on the drive. All that without success. So I used my LogiLink adapter to connect this drive to my workstation. Drive spun which means that it is somehow mechanically working.

I connected the drive to another Proxmox server thru USB and then magically it popped up being available again.

Quick look on SMART values and no disaster found here. Especially no read errors and no reallocations. So it might be that the drive itself is fine although file system is struggling inside.

So, the thing is then to use testdrive utility to read raw files from the drive regardless of what problems with partition table there are. We can check partition scheme with parted, fdisk and few other similar tools.

Just run testdrive (or install with apt install testdrive):

Select failed drive:

Select partition table type:

Analyze:

Press P to list files:

Now you can navigate thru filesystem. In my case it is possible as the drive itself seems to be almost fine and the problem is within filesystem. In other cases like drive drops and weird noise coming out of it – your milage may vary on success level factor.

Having qcow files holding your VM disk image you can then import it to another VM created without disk:

qm importdisk VIMD FILE DATASTORE

Remember to run datastore in at least mirrored setup (with md or zfs) and have proper backup. Although I had here some backups, I decided to give a try of recoving thoses files as it might be a lesson learned for the future real world cases.