Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use git repack -a -d --depth=250 --window=250

Tags:

git

I've seen git gc --aggressive --prune and git repack -a -d --depth=250 --window=250 recommended for reducing the size of your local .git folders where a long local history is not needed. From my reading it seems git-repack is preferred, can anyone comment on this?

What I really want to know is how to decide on values for depth and window. I use git to commit, push, pull and merge, I have no idea what a delta chain or object window is.

like image 693
Jake Avatar asked Feb 12 '13 21:02

Jake


2 Answers

I ran some tests with different values. This is too large to be a comment on twalbergs answer.

My company has a code base that has been in svn, mercurial, and now git. It is 10 years old, with 21,000 commits.

Before the pack it was 3.1 GB. After the repack, it shrunk to the following values:
(running the repack on a fresh clone of the 3.1GB folder each time).

git repack -a -d --depth=50 --window=10 -f
141.584 MB

git repack -a -d --depth=250 --window=1000 -f
110.484 MB

git repack -a -d --depth=500 --window=1000 -f
110.204 MB

They took about 5, 15 and 30 minutes respectively on my quad core mac.


Update:

I took the second repack (250,1000) and reran the repack with 500, and 1000 to see if there is any difference between a fresh 3.1gb repo and an already repacked 110mb repo.

git repack -a -d --depth=250 --window=1000 -f
110.484 MB
git repack -a -d --depth=500 --window=1000 -f
110.212 MB

Verdict: the repack 500, 1000 resulted in a 110.2 MB file regardless if it had already been packed or not.

Update2:

I was further curious if running a repack with lower values on an already repacked repo would cause the size to increase.

git repack -a -d --depth=500 --window=1000 -f
110.204 MB
git repack -a -d --depth=50 --window=10 -f  
142.056 MB

Verdict: the repack caused the repo size to balloon back up to ~140 MB from 110 MB

like image 178
spuder Avatar answered Oct 22 '22 05:10

spuder


"Object window" - when repacking git compares each object (every version of every file, every directory tree object, every commit message, every tag...) against a certain number of other similar-ish objects to find one that creates the smallest delta - roughly speaking, the smallest patch that can create this object from that base object.

"Delta chain" - When, in order to re-create object A, you first have to check out object B and apply a delta to it, but in order to create B you need object C, which requires D ....

Up to a point, increasing both depth and window can give you smaller packs. However, there are tradeoffs. For window, a higher setting means that git repack will compare each object with more objects while it is running, resulting in (potentially significantly) longer running time for git repack. However, once the pack is generated, window has no effect on further operations (outside of other repacks, anyway). depth, on the other hand, has less impact on the run time of git repack itself (although it still affects it somewhat), but the deeper your delta trees get, the longer it takes to re-build an old object from the sequence of base objects required to create the file. That means longer times for things like checkout when you're referencing older commits, so it can have a significant impact on the perceived efficiency of git if you do a lot of digging through your history. And, since git doesn't create deltas only against older objects, you can on occasion find a recent object that is slow to extract because it's a number of levels down the tree - it's not as common as with older objects, but it does happen.

I personally use window=1024 and depth=256 on all my repos except for a couple of clones of very large projects (e.g. Linux kernel).

like image 25
twalberg Avatar answered Oct 22 '22 05:10

twalberg