Most of the git filter-branch examples I've seen that are removing files have been to remove files based on filename. I don't necessarily want to do that. Instead, I've identified a number of blob (not commit) SHA1s of the files I want to remove, regardless of where they are in the repository. (Due to our repo history, files tend to move around a bunch without changing.)
What's the best way to tell git filter-branch to remove files based on their blob SHA1?
Your task is to remove blobs from Git history by a hash identifier. You may find it faster and easier to use the BFG rather than git-filter-branch
, specifically using the --strip-blobs-with-ids
flag:
-bi, --strip-blobs-with-ids
<blob-ids-file>
...strip blobs with the specified Git object ids
Carefully follow the usage instructions, the core part is just this:
$ java -jar bfg.jar --strip-blobs-with-ids <blob-ids-file> my-repo.git
Note that the <blob-ids-file>
file should contain Git object ids, rather plain SHA-1 hashes of the blob's contents.
For a given file, you can calculate the Git object id with git hash-object
:
$ git hash-object README.md
a63b49c2e93788cd71c81015818307c7b70963bf
You can see that this value is different to a simple SHA-1 hash:
$ sha1sum README.md
7b833f7b37550e2df719b57e8c4994c93a865aa9 README.md
...that's because the Git object id hashes a Git header, along with the contents of the file, even though it does use the same SHA-1 algorithm.
The BFG is typically at least 10-50x faster than running git-filter-branch
, and generally easier to use.
Full disclosure: I'm the author of the BFG Repo-Cleaner.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With