Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any difference between `git gc` and `git repack -ad; git prune`?

Tags:

git

git-gc

Is there any difference between git gc and git repack -ad; git prune?
If yes, what additional steps will be done by git gc (or vice versa)?
Which one is better to use in regard to space optimization or safety?

like image 355
Microsoft Linux TM Avatar asked Aug 09 '16 10:08

Microsoft Linux TM


2 Answers

Is there any difference between git gc and git repack -ad; git prune?

The difference is that by default git gc is very conservative about what housekeeping tasks are needed. For example, it won't run git repack unless the number of loose objects in the repository is above a certain threshold (configurable via the gc.auto variable). Also, git gc is going to run more tasks than just git repack and git prune.

If yes, what additional steps will be done by git gc (or vice versa)?

According to the documentation, git gc runs:

  • git-prune
  • git-reflog
  • git-repack
  • git-rerere

More specifically, by looking at the source code of gc.c (lines 338-343)1 we can see that it invokes at the most the following commands:

  • pack-refs --all --prune
  • reflog expire --all
  • repack -d -l
  • prune --expire
  • worktree prune --expire
  • rerere gc

Depending on the number of packs (lines 121-126), it may run repack with -A option instead (lines 203-212):

* If there are too many loose objects, but not too many
* packs, we run "repack -d -l". If there are too many packs,
* we run "repack -A -d -l".  Otherwise we tell the caller
* there is no need.
if (too_many_packs())
    add_repack_all_option();
else if (!too_many_loose_objects())
    return 0;

Notice on line 211-212 of the need_for_gc function that if there aren't enough loose objects in the repository, gc is not run at all.

This is further clarified in the documentation:

Housekeeping is required if there are too many loose objects or too many packs in the repository. If the number of loose objects exceeds the value of the gc.auto configuration variable, then all loose objects are combined into a single pack using git repack -d -l. Setting the value of gc.auto to 0 disables automatic packing of loose objects.

If the number of packs exceeds the value of gc.autoPackLimit, then existing packs (except those marked with a .keep file) are consolidated into a single pack by using the -A option of git repack.

As you can see, git gc strives to do the right thing based on the state of the repository.

Which one is better to use in regard to space optimization or safety?

In general it's better to run git gc --auto simply because it will do the least amount of work necessary to keep the repository in good shape – safely and without wasting too many resources.

However, keep in mind that a garbage collection may already be triggered automatically following certain commands, unless this behavior is disabled by the setting the gc.auto configuration variable to 0.

From the documentation:

--auto
With this option, git gc checks whether any housekeeping is required; if not, it exits without performing any work. Some git commands run git gc --auto after performing operations that could create many loose objects.

So for most repositories you shouldn't need to explicitly run git gc all that often, since it will already be taken care of for you.


1. As of commit a0a1831 made on 2016-08-08.

like image 120
Enrico Campidoglio Avatar answered Nov 15 '22 12:11

Enrico Campidoglio


git help gc contains a few hints...

The optional configuration variable gc.rerereresolved indicates how long records of conflicted merge you resolved earlier are kept.

The optional configuration variable gc.rerereunresolved indicates how long records of conflicted merge you have not resolved are kept.

I believe those are not done if you only do git repack -ad; git prune.

like image 41
AnoE Avatar answered Nov 15 '22 11:11

AnoE