Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Garbage collect commits in git

Tags:

git

I have some commits created by git subtree that I want to have garbage collect (more that any practical purpose just to understand what and why can get collected).

I have already checked that these commits are not referenced the following ways:

# In any reflog
> git reflog --all --no-abbrev-commit | grep <hash>
(no output)

# In any branch, local or remote
> git branch --contains <hash>
(no output)
> git branch -r --contains <hash>
(no output)

# In any tag
> git tag --contains <hash>
(no output)

# In the current index
> git rev-list HEAD | grep <hash>
(no output)

# In references from filter-branch
> ls .git/refs/original/
(the folder does not exist)

These are the place that git gc documentation lists that could contain references.

Still the given commits still exist after git gc.

Am I missing something? Or is there any git plumbing command that checks all this references?

like image 812
Maic López Sáenz Avatar asked Feb 20 '13 23:02

Maic López Sáenz


People also ask

What is git garbage collection?

Git gc –auto checks the git settings for threshold levels on free objects and packing compression size before executing. git config can be used to set these values. Git gc–auto will be run if the repository exceeds any of the housekeeping thresholds.

When should you not run git gc '?

Running git gc manually should only be needed when adding objects to a repository without regularly running such porcelain commands, to do a one-off repository optimization, or e.g. to clean up a suboptimal mass-import.

How would you force git to trigger garbage collection?

The easiest option would be to use a scheduled task in windows or a cron job in Unix to run git gc periodically.

What is git gc -- prune now?

git gc --prune=now removes the commits themselves. Attention: Only using git gc --prune=now will not work since those commits are still referenced in the reflog. Therefore, clearing the reflog is mandatory. Also note that if you use rerere it has additional references not cleared by these commands.


2 Answers

Every time I want to delete loose objects, I use the following commands:

rm -rf .git/refs/original/*
git reflog expire --all --expire-unreachable=0
git repack -A -d
git prune
like image 189
William Seiti Mizuta Avatar answered Oct 05 '22 01:10

William Seiti Mizuta


Commits (or objects in general) aren't actually deleted until they've been unpacked into loose objects and left that way for at least 2 weeks. You can use git gc --prune=now to skip the 2 week delay.

Normally what happens is git will pack your objects together into a packfile. This provides much better compression and efficiency than having loose objects. This typically happens whenever a git gc is executed. However, if an object is unreferenced, then git gc will unpack it back into a loose object.

Once unpacked, git gc will automatically prune old loose unreferenced objects. This is controlled by the --prune=<date> flag, which defaults to 2 weeks ago, so it prunes any old unreferenced object older than 2 weeks. By specifying --prune=now, you're asking git gc to prune any objects that are older than right now, which basically means to prune any unreferenced objects that exist.

like image 33
Lily Ballard Avatar answered Oct 05 '22 03:10

Lily Ballard