Alright, the inaugural post!
Background
One of the most recent challenges at work was to implement a system to backup our internal servers to a local backup device and to a remote backup device. This covers my ass in the event of a fire at the office or a hurricane or whatever.
The existing backup was a 7 day rolling snapshot backup which was better than nothing. However the solution was not as elegant as what I am proposing here.
The inspiration for this solution comes from here where Mr. Rubel outlines how to create incremental backups using rsync/cp -al and also using rsync/–link-dest. For my purposes rsync/–link-dest was the answer.
The Solution (in Theory)
The solution is hard links. If I create a file called “unique” I have one physical file and a link to that file. If I then create another link to the same file called “not_unique” I still only have one physical file but there are now two links to it. If I remove one link, the file remains. If I remove all links, the file is destroyed. As usual, Wikipedia explains it well.
So, based on that theory, the idea is to create a complete snapshot backup once and only once. After that, incremental backups would be created using hard links.
- Any files that have been added since the last incremental backup would be created in the current increment.
- Any files that have been changed since the last increment would be re-created in the latest increment (which would preserve the state in previous backups).
- Any files that have been removed since the last increment would be removed from the current increment (which would again preserve the state in previous backups).
The only hit to the disk is for files that have changed. The links take space, but it’s minimal.
Demonstration
First, I create the initial backup. I’m using -i on the ls command to show the inode id of the file:
$ ls -lRi
.:
total 12
41451556 drwxr-xr-x 2 scott scott 4096 2012-11-06 22:28 2012.11.06/
./2012.11.06:
total 12
41423324 -rw-r--r-- 1 scott scott 93 2012-11-06 22:28 existing_file
Next, I create the first increment with a new file added. Make note of the inode id number that I have coloured to show the hard links:
$ ls -lRi
.:
total 16
41451556 drwxr-xr-x 2 scott scott 4096 2012-11-06 22:28 2012.11.06/
41451557 drwxr-xr-x 2 scott scott 4096 2012-11-06 22:32 2012.11.07/
./2012.11.06:
total 12
41423324 -rw-r--r-- 2 scott scott 93 2012-11-06 22:28 existing_file
./2012.11.07:
total 16
41423324 -rw-r--r-- 2 scott scott 93 2012-11-06 22:28 existing_file
41423328 -rw-r--r-- 1 scott scott 143 2012-11-06 22:32 new_file
Next, I create the second increment and update the existing file. The first two versions of the file have the same inode id and the third is different:
$ ls -lRi
.:
total 20
41451556 drwxr-xr-x 2 scott scott 4096 2012-11-06 22:28 2012.11.06/
41451557 drwxr-xr-x 2 scott scott 4096 2012-11-06 22:32 2012.11.07/
41451558 drwxr-xr-x 2 scott scott 4096 2012-11-06 22:38 2012.11.08/
./2012.11.06:
total 12
41423324 -rw-r--r-- 2 scott scott 93 2012-11-06 22:28 existing_file
./2012.11.07:
total 16
41423324 -rw-r--r-- 2 scott scott 93 2012-11-06 22:28 existing_file
41423328 -rw-r--r-- 2 scott scott 143 2012-11-06 22:32 new_file
./2012.11.08:
total 12
41423329 -rw-r--r-- 1 scott scott 0 2012-11-06 22:38 existing_file
41423328 -rw-r--r-- 2 scott scott 143 2012-11-06 22:32 new_file
Finally, I create the third increment and remove the original file. The first two versions of the file have the same inode id as before, the third is different as before but the fourth version is missing as expected:
$ ls -lRi
.:
total 16
41451556 drwxr-xr-x 2 scott scott 4096 2012-11-06 22:28 2012.11.06
41451557 drwxr-xr-x 2 scott scott 4096 2012-11-06 22:32 2012.11.07
41451558 drwxr-xr-x 2 scott scott 4096 2012-11-06 22:38 2012.11.08
41451560 drwxr-xr-x 2 scott scott 4096 2012-11-06 22:50 2012.11.09
./2012.11.06:
total 4
41423324 -rw-r--r-- 2 scott scott 93 2012-11-06 22:28 existing_file
./2012.11.07:
total 8
41423324 -rw-r--r-- 2 scott scott 93 2012-11-06 22:28 existing_file
41423328 -rw-r--r-- 3 scott scott 143 2012-11-06 22:32 new_file
./2012.11.08:
total 4
41423329 -rw-r--r-- 1 scott scott 0 2012-11-06 22:38 existing_file
41423328 -rw-r--r-- 3 scott scott 143 2012-11-06 22:32 new_file
./2012.11.09:
total 4
41423328 -rw-r--r-- 3 scott scott 143 2012-11-06 22:32 new_file
The Solution (in Practice)
I created a starting point using the following command:
rsync -avzh --delete SOURCE -e ssh root@SERVER:PATH
- SOURCE is the absolute path on the source device.
- SERVER is your destination device (IP Address or domain name)
- PATH is the absolute path on the destination device where the backup will be created
Next, I created the first increment:
rsync -avzh --delete --link-dest=PATH1 SOURCE -e ssh root@SERVER:PATH2
- SOURCE is the absolute path on the source device.
- SERVER is your destination device (IP Address or domain name)
- PATH1 is the absolute path on the destination device that rsync will copy to create the next increment.
- PATH2 is the absolute path on the destination device where the backup will be created
Finally, I created a script to run via cron nightly:
#!/bin/bash
CURRENT=PATH/$(date +'%Y.%m.%d')
LINKDEST=PATH/$(date +'%Y.%m.%d' -d "yesterday")
ssh root@SERVER mkdir -p $CURRENT
rsync -avzh --delete --link-dest=$LINKDEST SOURCE -e ssh root@SERVER:$CURRENT
- SOURCE is the absolute path on the source device.
- SERVER is your destination device (IP Address or domain name)
- PATH is the absolute path on the destination device where the backup will be created
This script requires that you have set up authorized keys. This is explained here.