November 17, 2014

Space-Efficient Hard Drive Images with dd

dd is able to copy every single bit of data from a source volume to some other destination. This is cool for some applications but has drawbacks when you simply want to make a backup of a volume: even when your volume is only -let's say- half full, the copy will be as big as the entire volume. Furthermore, when you pipe the data stream read by dd through gzip it won't compress well. Why? Because the deleted data (temporary files, your last deleted download of Wikipedia, etc.) is still on the volume. The data is only unlinked, not visible as its address is deleted from the volumes partition table. It's basically the same problem why “deleted” data is not really gone but can be restored with the right tools.

A simple but quite time consuming solution is to wipe the free space of the volume, i.e., to write zeros to the entire remaining free space of the volume. This can be done easily by executing something like

mount /dev/volume /mountpoint/for/volume/
dd if=/dev/zero of=/mountpoint/for/volume/zero.img bs=4m

Filling half a Terabyte of free space took about 3 to 4 hours on a more or less decent machine with a hard drive (one with those nasty clicking and whirring things with rotating metal disks inside).

Let this run until dd stops when the entire free space on the volume is filled. Now delete the zero.img, unmount the volume and dd | gzip the volume:

dd if=/dev/volume bs=4m | gzip -9 > /path/to/destination/volume.img.gz

As gzip compresses the zeros on the volume very efficiently the resulting volume.img.gz is quite small.

That is maybe not the quickest and easiest solution but it is (as I think) still better than using some (proprietary) tool that won't understand its deceased image format when there's a new version in two years. And it's free, …

© holger 2015 - 2020 |