See gc. auto below for how to disable this behavior. Running git gc manually should only be needed when adding objects to a repository without regularly running such porcelain commands, to do a one-off repository optimization, or e.g. to clean up a suboptimal mass-import.
In general, git gc is safe to run. It won't throw away any commits reachable from any named reference. Depending on how you've set the appropriate expiration variable (e.g., gc.
git gc configuration An optional variable that defaults to 90 days. It is used to set how long records in a branches reflog should be preserved. An optional variable that defaults to 30 days.
The git gc command cleans up unnecessary files and optimizes the local repository. GitHub runs this operation on its hosted repositories automatically on a regular basis based on a variety of triggers.
It depends mostly on how much the repository is used. With one user checking in once a day and a branch/merge/etc operation once a week you probably don't need to run it more than once a year.
With several dozen developers working on several dozen projects each checking in 2-3 times a day, you might want to run it nightly.
It won't hurt to run it more frequently than needed, though.
What I'd do is run it now, then a week from now take a measurement of disk utilization, run it again, and measure disk utilization again. If it drops 5% in size, then run it once a week. If it drops more, then run it more frequently. If it drops less, then run it less frequently.
Note that the downside of garbage-collecting your repository is that, well, the garbage gets collected. As we all know as computer users, files we consider garbage right now might turn out to be very valuable three days in the future. The fact that git keeps most of its debris around has saved my bacon several times – by browsing all the dangling commits, I have recovered much work that I had accidentally canned.
So don’t be too much of a neat freak in your private clones. There’s little need for it.
OTOH, the value of data recoverability is questionable for repos used mainly as remotes, eg. the place all the devs push to and/or pulled from. There, it might be sensible to kick off a GC run and a repacking frequently.
Recent versions of git run gc automatically when required, so you shouldn't have to do anything. See the Options section of man git-gc(1): "Some git commands run git gc --auto after performing operations that could create many loose objects."
If you're using Git-Gui, it tells you when you should worry:
This repository currently has approximately 1500 loose objects.
The following command will bring a similar number:
$ git count-objects
Except, from its source, git-gui will do the math by itself, actually counting something at .git/objects folder and probably brings an approximation (I don't know tcl to properly read that!).
In any case, it seems to give the warning based on an arbitrary number around 300 loose objects.
Drop it in a cron job that runs every night (afternoon?) when you're sleeping.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With