Is there a way to back up a mercurial repository while preserving the files' timestamps?
Right now, I'm using hg clone
to copy the repository to a staging directory, and the backup program picks up the files from there. I'm not pointing the backup program directly at the repository because I don't want it to be changing (from commits) while the backup is happening.
The problem is that hg clone
changes all the files' timestamps to the current time, so the backup program (which I cannot change) thinks everything has been modified.
Plan A: When the source and destination directories reside on the same file system, hg clone -U
would simply hardlink all its files in the repository, without changing timestamps. This approach is quite fast and always safe (files are unlinked lazily when written to).
If you need to, you can clone on the same file system first, and then rsync this new clone over to another file system.
Plan B: It's usually safe to use rsync or some other file-based synchronization tool. Mercurial doesn't store anything magical on disk, just plain files.
There is a race condition, when you happen to commit to this repository at the same time when rsync is running, but I think it's negligible because a "hg rollback
" should be able to clean up your such inconsistencies if you restore from a broken backup. Do note, that rollback cannot recover if you had multiple separate transactions (such as multiple "push" or "commit" commands) in the rsync window, or run destructive operations that tamper with history (such as rebase, hg strip
, and some MQ commands).
I suggest using hg pull
instead of hg clone
. So you'll keep a mirror of the repository on your server and update it periodically with hg pull
. You then let your backup program take a backup of that. When you use hg pull
you will transfer the newest history and only changed files under .hg/store/data
which were actually effected by the pull.
Here I tested this by making a small repo with two files: a.txt
and b.txt
. I then cloned the repository "to the server" using hg clone --noupdate
. That ensures that we have no working copy on the server -- it only needs the history found in .hg
.
The timestamps looked like this after the clone:
% ll --time-style=full .hg/store/data total 8.0K -rw-r--r-- 1 mg mg 76 2009-11-25 20:07:52.000000000 +0100 a.txt.i -rw-r--r-- 1 mg mg 69 2009-11-25 20:07:52.000000000 +0100 b.txt.i
As you noted, they are all identical since the files were all just created by the clone operation. I then changed the original repository (the one on the client) and made a commit. After pulling the changeset I got these timestamps:
% ll --time-style=full .hg/store/data total 8.0K -rw-r--r-- 1 mg mg 159 2009-11-25 20:08:47.000000000 +0100 a.txt.i -rw-r--r-- 1 mg mg 69 2009-11-25 20:07:52.000000000 +0100 b.txt.i
Notice how the timestamp for a.txt.i
has been updated (I only touched a.txt
in my commit) while the timestamp for b.txt.i
has been left alone.
If your backup software is smart, it will even notice that Mercurial has only appended data to a.txt.i
. This means that the new a.txt.i
file is identical to the old a.txt.i
file up to certain point -- the backup program should therefore only copy the final part of the file. Rsync is an example of a backup program that will notice this.
Here's a hg extension that might help: TimestampExtension.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With