Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle a large git repository?

Tags:

git

I am currently using git for a large repository (around 12 GB, each branch having a size of 3 GB). This repository contains lots of binary files (audio and images).

The problem is that clone and pull can take lots of time. Specially the "Resolving deltas" step can be very very long.

What is the best way to solve this kind of problem?

I tried to remove delta compression, as it it explain here using the delta option in .gitattributes but it seems to not improve the clone duration.

Thanks in advance

Kevin

like image 618
Kevin MOLCARD Avatar asked Oct 12 '12 09:10

Kevin MOLCARD


People also ask

How do I manage large Git repository?

Using submodules One way out of the problem of large files is to use submodules, which enable you to manage one Git repository within another. You can create a submodule, which contains all your binary files, keeping the rest of the code separately in the parent repository, and update the submodule only when necessary.

How large is too large for a Git repo?

File size limits GitHub limits the size of files allowed in repositories. If you attempt to add or update a file that is larger than 50 MB, you will receive a warning from Git. The changes will still successfully push to your repository, but you can consider removing the commit to minimize performance impact.

Why is .Git so large?

As I told, git keeps track of each and every line change you made, it makes its history huge. But git uses a powerful compressing mechanism so that it will make all your codes to tiny tiny chunks. Also it stores the difference in between the files to reduce the size.


1 Answers

Update April 2015: Git Large File Storage (LFS) (by GitHub).

It uses git-lfs (see git-lfs.github.com) and tested with a server supporting it: lfs-test-server:
You can store metadata only in the git repo, and the large file elsewhere.

https://cloud.githubusercontent.com/assets/1319791/7051226/c4570828-ddf4-11e4-87eb-8fc165e5ece4.gif


Original answer (2012)

One solution, for large binary files that don't change much, is to store them in a different referential (like a Nexus repository), and version only a text file which declares which version you need.
Using an "artifact repository" is easier than storing binary elements in a source repo (made for comparing versions and merging between branches, which isn't of much use for said binaries).

The other solution, more git-centric, is git-annex:

git-annex allows managing files with git, without checking the file contents into git.
While that may seem paradoxical, it is useful when dealing with files larger than git can currently easily handle, whether due to limitations in memory, time, or disk space.

It is however not compatible with Windows.

A more generic solution could be git-media, which also allows you to use Git with large media files without storing the media in Git itself.

Finally, the easiest solution is to isolate those binaries in their own git submodule as you mention in your question: it isn't very satisfactory, and the initial clone will still take times, but the next updates for the parent repo will be short.

like image 138
VonC Avatar answered Sep 18 '22 09:09

VonC