We have a git repo containing both source code and binaries. The bare repo has now reached ~9GB, and cloning it takes ages. Most of the time is spent in "remote: Compressing objects". After a commit with a new version of one of the bigger binaries, a fetch takes a long time, also spent compressing objects on the server.
After reading git pull without remotely compressing objects I suspect delta compression of binary files is what hurts us as well, but I'm not 100% sure how to go about fixing this.
What are the exact steps to fix the bare repo on the server? My guess:
Am I on to something?
Update:
Some interesting test results on this. Today I started a bare clone of the problematic repo. Our not-so-powerful-server with 4GB ram ran out of memory and started swapping. After 3 hours I gave up...
Then I instead cloned a bare repo from my up-to-date working copy. Cloning that one between workstations took ~5 minutes. I then pushed it up to the server as a new repo. Cloning that repo took only 7 minutes.
If I interpret this correctly, a better packed repo performs much better, even without disabling the delta-compression for binary files. I guess this means the steps above are indeed what I want to do in the short term, but in addition I need to find out how to limit the amount of memory git is allowed to use for packing/compression on the server so I can avoid the swapping.
In case it matters: The server runs git 1.7.0.4 and the workstations run 1.7.9.5.
Update 2:
I did the following steps on my testrepo, and think I will chance to do them on the server (after a backup)
Limit memory usage when packing objects
git config pack.windowMemory 100m
git config pack.packSizeLimit 200m
Disable delta compression for some extensions
echo '*.tar.gz -delta' >> info/attributes
echo '*.tar.bz2 -delta' >> info/attributes
echo '*.bin -delta' >> info/attributes
echo '*.png -delta' >> info/attributes
Repack repository and collect garbage
git repack -a -d -F --window-memory 100m --max-pack-size 200m
git gc
Update 3:
Some unexpected side effects after this operation: Issues after trying to repack a git repo for improved performance
Git cannot handle large files on its own. That's why many Git teams add Git LFS to deal with large files in Git.
Git Compression of Blobs and Packfiles. Many users of Git are curious about the lack of delta compression at the object (blob) level when commits are first written. This efficiency is saved until the pack file is written. Loose objects are written in compressed, but non-delta format at the time of each commit.
Git LFS (Large File Storage) is a Git extension developed by Atlassian, GitHub, and a few other open source contributors, that reduces the impact of large files in your repository by downloading the relevant versions of them lazily.
Delta compression (also called delta encoding, or just delta coding), is where only the differences to a known base file is stored, discarding any similarities. To decompress this, you apply the stored changes (also called “diffs”) to the base file, leaving you with the new file.
While your questions asks on how to make your current repo more efficient, I don't think that's feasible.
Follow the advice of the crowd:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With