Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fully backup a git repo?

Tags:

git

backup

People also ask

How do I backup a project in GitHub?

There're two ways you can backup your git repository on multiple platforms. First, you can use git remote add command, and second, you can push git bare repository into another git services.

How do I copy a whole repo?

You first have to get the original Git repository on your machine. Then, go into the repository. Finally, use the --mirror flag to copy everything in your local Git repository into the new repo.


git bundle

I like that method, as it results in only one file, easier to copy around.
See ProGit: little bundle of joy.
See also "How can I email someone a git repository?", where the command

git bundle create /tmp/foo-all --all

is detailed:

git bundle will only package references that are shown by git show-ref: this includes heads, tags, and remote heads.
It is very important that the basis used be held by the destination.
It is okay to err on the side of caution, causing the bundle file to contain objects already in the destination, as these are ignored when unpacking at the destination.


For using that bundle, you can clone it, specifying a non-existent folder (outside of any git repo):

git clone /tmp/foo-all newFolder

Whats about just make a clone of it?

git clone --mirror other/repo.git

Every repository is a backup of its remote.


Expanding on the great answers by KingCrunch and VonC

I combined them both:

git clone --mirror [email protected]/reponame reponame.git
cd reponame.git
git bundle create reponame.bundle --all

After that you have a file called reponame.bundle that can be easily copied around. You can then create a new normal git repository from that using git clone reponame.bundle reponame.

Note that git bundle only copies commits that lead to some reference (branch or tag) in the repository. So tangling commits are not stored to the bundle.


Expanding on some other answers, this is what I do:

Setup the repo: git clone --mirror user@server:/url-to-repo.git

Then when you want to refresh the backup: git remote update from the clone location.

This backs up all branches and tags, including new ones that get added later, although it's worth noting that branches that get deleted do not get deleted from the clone (which for a backup may be a good thing).

This is atomic so doesn't have the problems that a simple copy would.

See http://www.garron.me/en/bits/backup-git-bare-repo.html


This thread was very helpful to get some insights how backups of git repos could be done. I think it still lacks some hints, information or conclusion to find the "correct way" (tm) for oneself. Therefore sharing my thoughts here to help others and put them up for discussions to enhance them. Thanks.

So starting with picking-up the original question:

  • Goal is to get as close as possible to a "full" backup of a git repository.

Then enriching it with the typical wishes and specifiying some presettings:

  • Backup via a "hot-copy" is preferred to avoid service downtime.
  • Shortcomings of git will be worked around by additional commands.
  • A script should do the backup to combine the multiple steps for a single backup and to avoid human mistakes (typos, etc.).
  • Additionally a script should do the restore to adapt the dump to the target machine, e.g. even the configuration of the original machine may have changed since the backup.
  • Environment is a git server on a Linux machine with a file system that supports hardlinks.

1. What is a "full" git repo backup?

The point of view differs on what a "100%" backup is. Here are two typical ones.

#1 Developer's point of view

  • Content
  • References

git is a developer tool and supports this point of view via git clone --mirror and git bundle --all.

#2 Admin's point of view

  • Content files
    • Special case "packfile": git combines and compacts objects into packfiles during garbage collection (see git gc)
  • git configuration
    • see https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain
    • docs: man git-config, man gitignore
    • .git/config
    • .git/description (for hooks and tools, e.g. post-receive-email hook, gitolite, GitWeb, etc.)
    • .git/hooks/
    • .git/info/ (repository exclude file, etc.)
  • Optional: OS configuration (file system permissions, etc.)

git is a developer tool and leaves this to the admin. Backup of the git configuration and OS configuration should be seen as separated from the backup of the content.

2. Techniques

  • "Cold-Copy"
    • Stop the service to have exclusive access to its files. Downtime!
  • "Hot-Copy"
    • Service provides a fixed state for backup purposes. On-going changes do not affect that state.

3. Other topics to think about

Most of them are generic for backups.

  • Is there enough space to hold the full backups? How many generations will be stored?
  • Is an incremental approach wanted? How many generations will be stored and when to create a full backup again?
  • How to verify that a backup is not corrupted after creation or over time?
  • Does the file system support hardlinks?
  • Put backup into a single archive file or use directory structure?

4. What git provides to backup content

  • git gc --auto

    • docs: man git-gc
    • Cleans up and compacts a repository.
  • git bundle --all

    • docs: man git-bundle, man git-rev-list
    • Atomic = "Hot-Copy"
    • Bundles are dump files and can be directly used with git (verify, clone, etc.).
    • Supports incremental extraction.
    • Verifiable via git bundle verify.
  • git clone --mirror

    • docs: man git-clone, man git-fsck, What's the difference between git clone --mirror and git clone --bare
    • Atomic = "Hot-Copy"
    • Mirrors are real git repositories.
    • Primary intention of this command is to build a full active mirror, that periodically fetches updates from the original repository.
    • Supports hardlinks for mirrors on same file system to avoid wasting space.
    • Verifiable via git fsck.
    • Mirrors can be used as a basis for a full file backup script.

5. Cold-Copy

A cold-copy backup can always do a full file backup: deny all accesses to the git repos, do backup and allow accesses again.

  • Possible Issues
    • May not be easy - or even possible - to deny all accesses, e.g. shared access via file system.
    • Even if the repo is on a client-only machine with a single user, then the user still may commit something during an automated backup run :(
    • Downtime may not be acceptable on server and doing a backup of multiple huge repos can take a long time.
  • Ideas for Mitigation:
    • Prevent direct repo access via file system in general, even if clients are on the same machine.
    • For SSH/HTTP access use git authorization managers (e.g. gitolite) to dynamically manage access or modify authentication files in a scripted way.
    • Backup repos one-by-one to reduce downtime for each repo. Deny one repo, do backup and allow access again, then continue with the next repo.
    • Have planned maintenance schedule to avoid upset of developers.
    • Only backup when repository has changed. Maybe very hard to implement, e.g. list of objects plus having packfiles in mind, checksums of config and hooks, etc.

6. Hot-Copy

File backups cannot be done with active repos due to risk of corrupted data by on-going commits. A hot-copy provides a fixed state of an active repository for backup purposes. On-going commits do not affect that copy. As listed above git's clone and bundle functionalities support this, but for a "100% admin" backup several things have to be done via additional commands.

"100% admin" hot-copy backup

  • Option 1: use git bundle --all to create full/incremental dump files of content and copy/backup configuration files separately.
  • Option 2: use git clone --mirror, handle and copy configuration separately, then do full file backup of mirror.
    • Notes:
    • A mirror is a new repository, that is populated with the current git template on creation.
    • Clean up configuration files and directories, then copy configuration files from original source repository.
    • Backup script may also apply OS configuration like file permissions on the mirror.
    • Use a filesystem that supports hardlinks and create the mirror on the same filesystem as the source repository to gain speed and reduce space consumption during backup.

7. Restore

  • Check and adopt git configuration to target machine and latest "way of doing" philosophy.
  • Check and adopt OS configuration to target machine and latest "way of doing" philosophy.