Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is BFG changing my latest commit?

git filter-branch was taking a long time. Happily, I found BFG repo-cleaner.

But it is unexpectedly changing the contents of my last commit.

$ git clone --mirror example.com:/repo.git
$ cd repo.git
$ git log HEAD^!
commit 5f737d28756d4854d25899632abffe7cca2c7423
Author: Paul Draper <[email protected]>
Date:   Sat Jan 24 19:31:47 2015 -0700

    Fix /contact and /folderEntries/listFoldersSimple
$ git diff --stat HEAD^!
 cake/app/controllers/folder_entries_controller.php |     1 +

And now I clean.

$ java -jar ~/bfg-1.12.0.jar -b 1M
...
In total, 161797 object ids were changed. Full details are logged here:
...
$ git log HEAD^!
commit 3ff700cebe32497423435b416ea11169b7fcbf90
Author: Paul Draper <[email protected]>
Date:   Sat Jan 24 19:31:47 2015 -0700

    Fix /contact and /folderEntries/listFoldersSimple


    Former-commit-id: 5f737d28756d4854d25899632abffe7cca2c7423
$ git diff --stat HEAD^!
     cake/app/controllers/folder_entries_controller.php |     1 +
 .../lucidchart-tools/caja/ant-jars/guava-r09.jar   |   Bin 0 -> 1141964 bytes
 .../caja/ant-jars/guava-r09.jar.REMOVED.git-id     |     1 -
 cake/app/lucidchart-tools/caja/ant-jars/js.jar     |   Bin 0 -> 1122370 bytes
 .../caja/ant-jars/js.jar.REMOVED.git-id            |     1 -
 .../lucidchart-tools/caja/ant-jars/pluginc-src.jar |   Bin 0 -> 5172676 bytes
 .../caja/ant-jars/pluginc-src.jar.REMOVED.git-id   |     1 -
 .../app/lucidchart-tools/caja/ant-jars/pluginc.jar |   Bin 0 -> 2959487 bytes
 .../caja/ant-jars/pluginc.jar.REMOVED.git-id       |     1 -
 .../lucidchart-tools/caja/ant-jars/xercesImpl.jar  |   Bin 0 -> 1229125 bytes
 .../caja/ant-jars/xercesImpl.jar.REMOVED.git-id    |     1 -
 cake/app/lucidchart-tools/jsdoc/rhino/js.jar       |   Bin 0 -> 1111429 bytes
 .../jsdoc/rhino/js.jar.REMOVED.git-id              |     1 -
 cake/app/lucidchart-tools/selenium/chromedriver    |   Bin 0 -> 5778064 bytes
 .../selenium/chromedriver.REMOVED.git-id           |     1 -
 .../selenium/selenium-server-standalone-2.37.0.jar |   Bin 0 -> 34730734 bytes
 ...ium-server-standalone-2.37.0.jar.REMOVED.git-id |     1 -
 .../selenium-server-standalone-2.42.2-mod.jar      |   Bin 0 -> 34873583 bytes
 ...server-standalone-2.42.2-mod.jar.REMOVED.git-id |     1 -
 .../selenium/selenium-server-standalone-2.42.2.jar |   Bin 0 -> 34823352 bytes
 ...ium-server-standalone-2.42.2.jar.REMOVED.git-id |     1 -
 .../lucidchart-tools/test-runner-1.0-SNAPSHOT.jar  |   Bin 0 -> 9732125 bytes
 .../test-runner-1.0-SNAPSHOT.jar.REMOVED.git-id    |     1 -
 .../CommandLine/Scaffolders/DefaultScaffolder.phar |   Bin 0 -> 4404199 bytes
 .../DefaultScaffolder.phar.REMOVED.git-id          |     1 -
 .../WebPICmdLine/Microsoft.Web.Deployment.dll      |   Bin 0 -> 1201991 bytes
 .../Microsoft.Web.Deployment.dll.REMOVED.git-id    |     1 -
 cake/app/vendors/aws.phar                          |   Bin 0 -> 6784935 bytes
 cake/app/vendors/aws.phar.REMOVED.git-id           |     1 -
 .../tcpdf/fonts/dejavu-fonts-ttf-2.33/status.txt   |  6657 +++++
 .../status.txt.REMOVED.git-id                      |     1 -
 cake/app/vendors/tcpdf/tcpdf.php                   | 28808 +++++++++++++++++++
 cake/app/vendors/tcpdf/tcpdf.php.REMOVED.git-id    |     1 -
 .../img/onboarding-chart/04_shape manager.gif      |   Bin 0 -> 1413721 bytes
 .../04_shape manager.gif.REMOVED.git-id            |     1 -
 cake/app/webroot/img/onboarding-chart/05_share.gif |   Bin 0 -> 1341876 bytes
 .../onboarding-chart/05_share.gif.REMOVED.git-id   |     1 -
 .../js/closure/usage/rhino/javadoc/index-all.html  | 12027 ++++++++
 .../rhino/javadoc/index-all.html.REMOVED.git-id    |     1 -
 cake/app/webroot/js/closure/usage/rhino/js-14.jar  |   Bin 0 -> 1471932 bytes
 .../closure/usage/rhino/js-14.jar.REMOVED.git-id   |     1 -
 cake/app/webroot/js/closure/usage/rhino/js.jar     |   Bin 0 -> 1134765 bytes
 .../js/closure/usage/rhino/js.jar.REMOVED.git-id   |     1 -
 .../js/closure/usage/rhino/testsrc/tests.tar.gz    |   Bin 0 -> 1778543 bytes
 .../rhino/testsrc/tests.tar.gz.REMOVED.git-id      |     1 -
 cake/app/webroot/js/mathquill/font/Symbola.svg     |  5102 ++++
 .../js/mathquill/font/Symbola.svg.REMOVED.git-id   |     1 -
 .../webroot/js/templates/SoyToJsSrcCompiler.jar    |   Bin 0 -> 2154164 bytes
 .../SoyToJsSrcCompiler.jar.REMOVED.git-id          |     1 -
 cake/app/webroot/persona-pages/img/gif-v3.gif      |   Bin 0 -> 1570363 bytes
 .../persona-pages/img/gif-v3.gif.REMOVED.git-id    |     1 -
 .../webroot/persona-pages/img/interactive-gif.gif  |   Bin 0 -> 1434134 bytes
 .../img/interactive-gif.gif.REMOVED.git-id         |     1 -
 cake/build/closure/compiler.jar                    |   Bin 0 -> 6007184 bytes
 cake/build/closure/compiler.jar.REMOVED.git-id     |     1 -
 .../lucidchart-mobile-sliders-landscape-4.png      |   Bin 0 -> 1718536 bytes
 ...t-mobile-sliders-landscape-4.png.REMOVED.git-id |     1 -
 .../lucidchart-mobile-sliders-portrait-4.png       |   Bin 0 -> 1614308 bytes
 ...rt-mobile-sliders-portrait-4.png.REMOVED.git-id |     1 -
 .../Versions/A/OCHamcrestIOS                       |   Bin 0 -> 3671740 bytes
 .../Versions/A/OCHamcrestIOS.REMOVED.git-id        |     1 -
 .../OCMockitoIOS.framework/Versions/A/OCMockitoIOS |   Bin 0 -> 1299132 bytes
 .../Versions/A/OCMockitoIOS.REMOVED.git-id         |     1 -
 .../Versions/A/CrashReporter                       |   Bin 0 -> 1432156 bytes
 .../Versions/A/CrashReporter.REMOVED.git-id        |     1 -
 chart-ios/libFlurry_6.0.0.a                        |   Bin 0 -> 3819300 bytes
 chart-ios/libFlurry_6.0.0.a.REMOVED.git-id         |     1 -
 67 files changed, 52595 insertions(+), 33 deletions(-)

All of these extra files are ones that I want removed.

Why are all these files being changed in my latest commit?

like image 349
Paul Draper Avatar asked Mar 18 '23 06:03

Paul Draper


1 Answers

Apparently this is a common misconception about how BFG works. From the documentation:

If something questionable - like a 10MB file, when you're telling The BFG to strip out everying over 5MB - is in a protected commit, it won't be removed, and because it's still there, there's no point deleting it from earlier commits either. If you want the BFG to delete something you need to make sure your current commits are clean.

That doesn't mean "there's no point deleting it earlier, so it won't", which is what I read it as.

It means "there's no point deleting it earlier, so it might not". In any case, it will adhere to the guarantee that the protected commits have the same tree.

In my case, it did delete these blobs earlier, but then had to add the content back to preserve the requirement that the HEAD tree is unchanged.

There's more complete discussion about this on Github:

  • https://github.com/rtyley/bfg-repo-cleaner/issues/49
  • https://github.com/rtyley/bfg-repo-cleaner/issues/53

EDIT

I found a way to remove large blobs except for those present on HEAD.

This uses bash + unix utilities to find any blob over 1MB (change 1024 * 1024 for different sizes), and then remove it with BFG:

comm -23 \
    <(git rev-list --objects --all | git cat-file  --batch-check="%(objecttype) %(objectname) %(objectsize) %(rest)" | grep ^blob | awk '$3 > 1024 * 1024 { print $2 }' | sort) \
    <(git ls-tree -r HEAD | cut -f 1 | cut -d ' ' -f 3 | sort) \
    > /tmp/large-blobs.list
java -jar bfg-1.12.0.jar -bi /tmp/large-blobs.list
like image 173
Paul Draper Avatar answered Mar 20 '23 05:03

Paul Draper