Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do I ever need to run git gc on a bare repo?

Tags:

git

git-gc

man git-gc doesn't have an obvious answer in it, and I haven't had any luck with Google either (although I might have just been using the wrong search terms).

I understand that you should occasionally run git gc on a local repository to prune dangling objects and compress history, among other things -- but is a shared bare repository susceptible to these same issues?

If it matters, our workflow is multiple developers pulling from and pushing to a bare repository on a shared network drive. The "central" repository was created with git init --bare --shared.

like image 611
Mark Rushakoff Avatar asked Aug 20 '10 16:08

Mark Rushakoff


People also ask

When should you not run git gc?

See gc. auto below for how to disable this behavior. Running git gc manually should only be needed when adding objects to a repository without regularly running such porcelain commands, to do a one-off repository optimization, or e.g. to clean up a suboptimal mass-import.

Does git gc run automatically?

The git gc --auto command variant first checks if any housekeeping is required on the repo before executing. If it finds housekeeping is not needed it exits without doing any work. Some Git commands implicitly run git gc --auto after execution to clean up any loose objects they have created.

Should I run git gc?

You should consider running git gc manually in a few situations: If you have just completed a git filter-branch . Recall that filter-branch rewrites many commits, introduces new ones, and leaves the old ones on a ref that should be removed when you are satisfied with the results.

How often does GitHub run git gc?

GitHub Support responded to this question on Twitter in 2013. We run git gc at most once per day, triggered automatically by a push.


1 Answers

As Jefromi commented on Dan's answer, git gc should be called automatically called during "normal" use of a bare repository.

I just ran git gc --aggressive on two bare, shared repositories that have been actively used; one with about 38 commits the past 3-4 weeks, and the other with about 488 commits over roughly 3 months. Nobody has manually run git gc on either repository.

Smaller repository

$ git count-objects 333 objects, 595 kilobytes  $ git count-objects -v count: 333 size: 595 in-pack: 0 packs: 0 size-pack: 0 prune-packable: 0 garbage: 0  $ git gc --aggressive Counting objects: 325, done. Delta compression using up to 4 threads. Compressing objects: 100% (323/323), done. Writing objects: 100% (325/325), done. Total 325 (delta 209), reused 0 (delta 0) Removing duplicate objects: 100% (256/256), done.  $ git count-objects -v count: 8 size: 6 in-pack: 325 packs: 1 size-pack: 324 prune-packable: 0 garbage: 0  $ git count-objects 8 objects, 6 kilobytes 

Larger repository

$ git count-objects 4315 objects, 11483 kilobytes  $ git count-objects -v count: 4315 size: 11483 in-pack: 9778 packs: 20 size-pack: 15726 prune-packable: 1395 garbage: 0  $ git gc --aggressive Counting objects: 8548, done. Delta compression using up to 4 threads. Compressing objects: 100% (8468/8468), done. Writing objects: 100% (8548/8548), done. Total 8548 (delta 7007), reused 0 (delta 0) Removing duplicate objects: 100% (256/256), done.  $ git count-objects -v count: 0 size: 0 in-pack: 8548 packs: 1 size-pack: 8937 prune-packable: 0 garbage: 0  $ git count-objects 0 objects, 0 kilobytes 

I wish I had thought of it before I gced these two repositories, but I should have run git gc without the --aggressive option to see the difference. Luckily I have a medium-sized active repository left to test (164 commits over nearly 2 months).

$ git count-objects -v count: 1279 size: 1574 in-pack: 2078 packs: 6 size-pack: 2080 prune-packable: 607 garbage: 0  $ git gc Counting objects: 1772, done. Delta compression using up to 4 threads. Compressing objects: 100% (1073/1073), done. Writing objects: 100% (1772/1772), done. Total 1772 (delta 1210), reused 1050 (delta 669) Removing duplicate objects: 100% (256/256), done.  $ git count-objects -v count: 0 size: 0 in-pack: 1772 packs: 1 size-pack: 1092 prune-packable: 0 garbage: 0  $ git gc --aggressive Counting objects: 1772, done. Delta compression using up to 4 threads. Compressing objects: 100% (1742/1742), done. Writing objects: 100% (1772/1772), done. Total 1772 (delta 1249), reused 0 (delta 0)  $ git count-objects -v count: 0 size: 0 in-pack: 1772 packs: 1 size-pack: 1058 prune-packable: 0 garbage: 0 

Running git gc clearly made a large dent in count-objects, even though we regularly push to and fetch from this repository. But upon reading the manpage for git config, I noticed that the default loose object limit is 6700, which we apparently had not yet reached.

So it appears that the conclusion is no, you don't need to run git gc manually on a bare repo;* but with the default setting for gc.auto, it might be a long time before garbage collection occurs automatically.


*Generally, you shouldn't need to run git gc. But sometimes you might be strapped for space and you should run git gc manually or set gc.auto to a lower value. My case for the question was simple curiosity, though.

like image 178
Mark Rushakoff Avatar answered Sep 18 '22 03:09

Mark Rushakoff