Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

git gc --aggressive --prune=all does not remove big file from repository

There are many SO questions regarding "how to remove an accidentally added big file from repo", many of them suggesting using git gc command. However, I find it not working for me and I don't know what's going wrong.

Here is what I have done:

$ git init
Initialized empty Git repository in /home/wzyboy/git/myrepo/.git/
$ echo hello >> README
$ git add README 
$ git commit -a -m 'init commit'
[master (root-commit) f21783f] init commit
 1 file changed, 1 insertion(+)
 create mode 100644 README
$ du -sh .git
152K    .git
$ cp ~/big.zip .
$ git add big.zip 
$ git commit -a -m 'adding big file'
[master 3abd0a4] adding big file
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 big.zip
$ du -sh .git
77M .git
$ git log --oneline 
3abd0a4 adding big file
f21783f init commit
$ git reset --hard f21783f
HEAD is now at f21783f init commit
$ git log --oneline 
f21783f init commit
$ git gc --aggressive --prune=all
Counting objects: 6, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), done.
Total 6 (delta 0), reused 0 (delta 0)
$ git gc --aggressive --prune=now
Counting objects: 6, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), done.
Total 6 (delta 0), reused 6 (delta 0)
$ du -sh .git
77M .git
$ git version
git version 2.2.2

In the console output above, I created a new git repo, added one small text file and the .git directory is 152K in size, so far so good. Then I added a big file into the repo and the directory bloats to 77M. However, aftering my attempting to remove the big file (git reset --hard or git rebase -i), I cannot recover the disk space claimed by the big file, no matter how I run git gc with different options.

Could any one tell me why git gc does not work in my case? What should I do to recover the disk space? Is it possible to recover the disk space using git gc instead of git filter-branch?

Thanks.

like image 234
Zhuoyun Wei Avatar asked Feb 04 '15 02:02

Zhuoyun Wei


People also ask

What does git gc -- aggressive -- Prune do?

Git prune is used to delete Git objects that the git gc config has judged unreachable. Learn more about the git prune command.

What does git gc -- prune now do?

--prune=now prunes loose objects regardless of their age and increases the risk of corruption if another process is writing to the repository concurrently; see "NOTES" below. --prune is on by default.


2 Answers

As Andrew C suggested, one needs to expire reflog to dereference the objects before git gc being able to recycle the loose objects. So the correct way to recover the disk space claimed by accidentally added big files is:

git reflog expire --expire=now --all
git gc --aggressive --prune=now

This will remove all the reflogs, so use with caution.

like image 121
Zhuoyun Wei Avatar answered Oct 22 '22 03:10

Zhuoyun Wei


One tip which can help avoiding any typo, with Git 2.18 (Q2 2018) is avoiding a gc prune with non-existing reference (called here: "nonsense")

"git gc --prune=nonsense" spent long time repacking and then silently failed when underlying "git prune --expire=nonsense" failed to parse its command line.
This has been corrected.

See commit 96913c9 (23 Apr 2018) by Junio C Hamano (gitster).
Helped-by: Linus Torvalds (torvalds).
(Merged by Junio C Hamano -- gitster -- in commit 3915f9a, 08 May 2018)

parseopt: handle malformed --expire arguments more nicely

A few commands that parse --expire=<time> command line option behave sillily when given nonsense input.
For example

$ git prune --no-expire
Segmentation falut
$ git prune --expire=npw; echo $?
129

Both come from parse_opt_expiry_date_cb().

The former is because the function is not prepared to see arg==NULL (for "--no-expire", it is a norm; "--expire" at the end of the command line could be made to pass NULL, if it is told that the argument is optional, but we don't so we do not have to worry about that case).

The latter is because it does not check the value returned from the underlying parse_expiry_date().

like image 39
VonC Avatar answered Oct 22 '22 02:10

VonC