I've seen git gc --aggressive --prune
and git repack -a -d --depth=250 --window=250
recommended for reducing the size of your local .git folders where a long local history is not needed.
From my reading it seems git-repack
is preferred, can anyone comment on this?
What I really want to know is how to decide on values for depth
and window
. I use git to commit, push, pull and merge, I have no idea what a delta chain or object window is.
I ran some tests with different values. This is too large to be a comment on twalbergs answer.
My company has a code base that has been in svn, mercurial, and now git. It is 10 years old, with 21,000 commits.
Before the pack it was 3.1 GB. After the repack, it shrunk to the following values:
(running the repack on a fresh clone of the 3.1GB folder each time).
git repack -a -d --depth=50 --window=10 -f
141.584 MB
git repack -a -d --depth=250 --window=1000 -f
110.484 MB
git repack -a -d --depth=500 --window=1000 -f
110.204 MB
They took about 5, 15 and 30 minutes respectively on my quad core mac.
Update:
I took the second repack (250,1000) and reran the repack with 500, and 1000 to see if there is any difference between a fresh 3.1gb repo and an already repacked 110mb repo.
git repack -a -d --depth=250 --window=1000 -f
110.484 MB
git repack -a -d --depth=500 --window=1000 -f
110.212 MB
Verdict: the repack 500, 1000 resulted in a 110.2 MB file regardless if it had already been packed or not.
Update2:
I was further curious if running a repack with lower values on an already repacked repo would cause the size to increase.
git repack -a -d --depth=500 --window=1000 -f
110.204 MB
git repack -a -d --depth=50 --window=10 -f
142.056 MB
Verdict: the repack caused the repo size to balloon back up to ~140 MB from 110 MB
"Object window" - when repacking git
compares each object (every version of every file, every directory tree object, every commit message, every tag...) against a certain number of other similar-ish objects to find one that creates the smallest delta - roughly speaking, the smallest patch that can create this object from that base object.
"Delta chain" - When, in order to re-create object A, you first have to check out object B and apply a delta to it, but in order to create B you need object C, which requires D ....
Up to a point, increasing both depth
and window
can give you smaller packs. However, there are tradeoffs. For window
, a higher setting means that git repack
will compare each object with more objects while it is running, resulting in (potentially significantly) longer running time for git repack
. However, once the pack is generated, window
has no effect on further operations (outside of other repack
s, anyway). depth
, on the other hand, has less impact on the run time of git repack
itself (although it still affects it somewhat), but the deeper your delta trees get, the longer it takes to re-build an old object from the sequence of base objects required to create the file. That means longer times for things like checkout
when you're referencing older commits, so it can have a significant impact on the perceived efficiency of git
if you do a lot of digging through your history. And, since git
doesn't create deltas only against older objects, you can on occasion find a recent object that is slow to extract because it's a number of levels down the tree - it's not as common as with older objects, but it does happen.
I personally use window=1024
and depth=256
on all my repos except for a couple of clones of very large projects (e.g. Linux kernel).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With