I've been trying to figure out my backup strategy at home. My current set of data I want to keep, critical documents and home photos/movies, is about 70GB. My current strategy is to keep backups on several machines at home. I'm trying to avoid using cloud storage. I've been using unison, which is a great backup tool. Some of its cool features are:

  • syncs between two machines, across any OS
  • tells you what files changed whenever you run it, and lets you override its default guesses as to which way to copy/delete files
  • pretty fast incremental backup (under a minute to detect all diffs on my data set)

Its only major downside is that it's not actively maintained. However, it's gone through a lot of testing, and I've been using it reliably for years, so I think it's a pretty solid app. It has a solid design, designed so that it never leaves your system in a bad state, even if interrupted. And it's open source, though it's written in OCaml, not the most widely-known language.

For a completely different approach, I looked into using a source-control system as my backup strategy. Specifically, I looked at using the distributed source control system Mercurial hg. After doing some testing, here are the problems I encountered with it:

  1. hg crashes on big files (you need 3x or 5x the RAM for the largest file you have). However, there is an extension called largefiles which ships with hg (though it's turned off by default), which can work around this problem
  2. hg does not preserve modification times of files. There are a few extensions which can work around this, but I wasn't really happy with them.
  3. you will need at least double your storage when using hg, since hg (like all major distributed source control systems) stores a copy of every file you add to a hidden folder (.hg)

I didn't test git, but it will definitely have problems #2 and #3 above. Oh well, back to unison for me for the time being.

P.S. I looked at git-annex as another approach, because it certainly is aiming at someone like me. However, its design seemed a little rickety to me -- I believe it replaces all your files with symlinks, it makes your files read-only by default (you need to explicitly "checkout" a file to change it) and it forks git commands under the covers. I think a backup solution needs to be simple and not have too many dependencies.


Comments

comments powered by Disqus