Backups : a personnal implementation

stephane

July 5, 2008

If you’ve been following my blog for a while, you might have seen posts about SSH, RSYNC, ZFS Snapshots and so on. This article aims at describing the big picture, and to explain how I’ve been using those tools and technologies to build my own home backup system.

Basic setup

On the server side, I have a Solaris 10 box, with a ZFS RAID-Z pool of 4 disks. A RAID-Z is similar to a RAID-5 and is a fair compromise between security of data (1 disk can fail without losing any data) and space consumption (1 single disk in the pool is used to store parity, which makes it 25% loss with 4 disks, as opposed to a RAID-1 where the usable space is 50% of the raw disk space).
For more information about RAID levels, check the extensive Wikipedia article.

For each machine I’ll want to back up, I setup a different dedicated ZFS file system in the zpool. This allows finer control for example to enforce quotas or set different parameters as needed.

SSH and RSYNC are installed on the server.

On the client side, a partition /toSave is created and holds everything which must be backed up. That is to say, for example on Unix machines, that /home is a link to /toSave/home. Or on Windows, the “My Documents” of every user is moved to D:\toSave.

Again SSH and RSYNC are installed. As far as Windows is concerned, you can use cwRSYNC for a Windows version of RSYNC.

A script is set up to synchronize this toSave folder to the backup server. In my case it is actually run manually, but if your computers are up and running 24/7 then you could automate this (via crontab on unix or scheduled tasks on windows).

Improving the basic setup

At this point we have a synchronized copy of all the valuable data on the backup server. But this is not quite enough. Imagine the situation where you made an unwanted edit in a document a few days ago and you want it back ? Not possible. Or if you deleted a document ? As the backup server is sync’d, then of course the document has been deleted as well….

This is where ZFS Snapshots come in handy : Every time you backup a machine, create a snapshot of its dedicated ZFS file system on the backup server (remember ? each machine has its own ZFS file system on the backup server). I name the snapshots by suffixing the date to the name of the machine (example : machine@2008-06-12) to be easily able to go and find files at a specified date. This is cheap with respect to space consumption (at least at the beginning, snapshots will grow as differences become more and more important) and will let you go back in time.

Advantages

This backup system is fast : Transfers are made with RSYNC and RSYNC only sends what has changed since last sync
This backup system is secure : Transfers are made through SSH, so it is in a cyphered network communication. No one can spy on your data as they flow to the backup server. Data are stored on a RAID-Z volume, should a disk fail your data are still safe (replace the failed disk quickly though ;))
This backup system is convenient : Automated RSYNC let’s you script and automate the entire thing.
This backup system lets you go back in time as you would expect, and N backups at different times doesn’t mean storing N times the volume of your data (thanks to ZFS Snapshots being space efficient)
This is an on-disk-backup-system, which means it is blinking fast to find and restore files.

Inconvenients

Obviously you don’t get the benefits of tape backups (such as being able to send them in a remote location) so if your backup server is on fire, you’ve lost your data history. This is acceptable for my personal data.

Conclusion

Thanks to Solaris’ ZFS and other opensource tools such as OpenSSH and RSYNC, I was easily able to taylor my own customized backup system, which perfectly matches my needs.

How do you perform your backups ? Any special trick ? Unique features ?