Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BFG Repo Cleaner is not working as expected

I am trying to reduce the size of a largish repo (~3.4 G) and bfg-repo-cleaner seemed like a perfect tool to to reduce the size of it.

I ran the tool as described in the docs but am only seeing minor reductions in the size of the repo. What is particularly surprising is that some (but not all) of the blogs that the tool has said it removed (deleted-files.txt) are still very much in the repository. I really don't want to start messing with git filter-branch so any help would be appreciated.

I intentionally went with the aggressive --no-blob-protection option to maximize the effect. I've included the commands I ran with the truncated output.

git count-objects -vH

count: 0
size: 0 bytes
in-pack: 1616184
packs: 1
size-pack: 3.38 GiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

du -rh -d 0

3.4G    .

java -jar ~/Downloads/bfg-1.12.12.jar --strip-blobs-bigger-than 2M --no-blob-protection ./

Scanning packfile for large blobs: 1616184
Scanning packfile for large blobs completed in 33,465 ms.
Found 242 blob ids for large blobs - biggest=497179278 smallest=2098032
Total size (unpacked)=3534794122
Found 0 objects to protect
Found 4965 tag-pointing refs : ...
Found 8519 commit-pointing refs :  ...

Protected commits
-----------------

You're not protecting any commits, which means the BFG will modify the contents of even *current* commits.

This isn't recommended - ideally, if your current commits are dirty, you should fix up your working copy and commit that, check that your build still works, and only then run the BFG to clean up your history.

Cleaning
--------

Found 110364 commits
Cleaning commits:       100% (110364/110364)
Cleaning commits completed in 345,977 ms.

Updating 13483 Refs
-------------------

Ref                                                                                                                  Before     After
----------------------------------------------------------------------------------------------------------------------------------------
...

Updating references:    100% (13483/13483)
...Ref update completed in 15,354 ms.

Commit Tree-Dirt History
------------------------

Earliest                                              Latest
|                                                          |
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD

D = dirty commits (file tree fixed)
m = modified commits (commit message or parents changed)
. = clean commits (no changes to file tree)

                        Before     After
-------------------------------------------
First modified commit | 757f8383 | c11fc923
Last dirty commit     | e28d047b | 92b88b05

Deleted files
-------------
...

In total, 418853 object ids were changed. Full details are logged here:

..bfg-report/2016-04-18/10-24-49

git count-objects -vH

count: 419093
size: 1.62 GiB
in-pack: 1616184
packs: 1
size-pack: 3.38 GiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

du -rh -d 0

5.1G    .

git reflog expire --expire=now --all && git gc --prune=now --aggressive

Counting objects: 1905870, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1786570/1786570), done.
Writing objects: 100% (1905870/1905870), done.
Total 1905870 (delta 1274991), reused 482300 (delta 0)
Removing duplicate objects: 100% (256/256), done.
Checking connectivity: 1905870, done.

git count-objects -vH

count: 0
size: 0 bytes
in-pack: 1905870
packs: 1
size-pack: 3.03 GiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

head ..bfg-report/2016-04-18/10-24-49/deleted-files.txt

8afa72875d3013620bb122916bd1ec33a066cbf2 1075353 file_name1.gpx
7656f6464c67f92c48cdbb03ec5a81067c636238 1644202 file_name2.csv
ab68fb197d4479b3b6dec6e85bd5cbaf433a87c5 773236 file_name3.ttf
86c9c0b55ff99c3789bb3ed17daf51bebacba1cb 870631 [email protected]
70c928943feab0a3a1f97b4f752e9dbc1d8f37fa 950305 [email protected]
3862d0da43f5902c75e86ff0dd925d8cca601de3 779356 [email protected]
6effce4b245961cb46e2cf3f4d05bd6c8c182760 908017 [email protected]
1866b1053dd48fc4d0677f03feb4baf2f67b567c 1353732 file_name8.gif
f0d984f00678504fe073110bb6553049e9678755 1350785 file_name9.gif
af877d286b12b9f79560a938375abe04a15ff405 3214192 file_name10.gif

git cat-file -s 8afa72875d3013620bb122916bd1ec33a066cbf2

1075353
like image 988
Zachary Cicala Avatar asked Apr 18 '16 18:04

Zachary Cicala


People also ask

What is BFG repo cleaner?

an alternative to git-filter-branch The BFG is a simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository history: Removing Crazy Big Files. Removing Passwords, Credentials & other Private data.


2 Answers

I've figured out the problem. We had a lot of old branches that still pointed to trees with large blobs. Deleting these and rerunning bfg gave me a multi gigabyte reduction.

I had thought that the --no-blob-protection flag would have addressed this state.

like image 146
Zachary Cicala Avatar answered Sep 22 '22 22:09

Zachary Cicala


I found that rerunning the bfg with the same command arguments multiple times kept having it find more commits to clean. Eventually it said

BFG aborting: No refs to update - no dirty commits found??

At that point, reflog expire and gc reduced the pack size.

like image 45
akatakritos Avatar answered Sep 19 '22 22:09

akatakritos