Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use git filter-branch to remove a file by blob SHA1?

Most of the git filter-branch examples I've seen that are removing files have been to remove files based on filename. I don't necessarily want to do that. Instead, I've identified a number of blob (not commit) SHA1s of the files I want to remove, regardless of where they are in the repository. (Due to our repo history, files tend to move around a bunch without changing.)

What's the best way to tell git filter-branch to remove files based on their blob SHA1?

like image 270
R.M. Avatar asked Jan 04 '23 13:01

R.M.


1 Answers

Your task is to remove blobs from Git history by a hash identifier. You may find it faster and easier to use the BFG rather than git-filter-branch, specifically using the --strip-blobs-with-ids flag:

-bi, --strip-blobs-with-ids <blob-ids-file> ...strip blobs with the specified Git object ids

Carefully follow the usage instructions, the core part is just this:

$ java -jar bfg.jar  --strip-blobs-with-ids <blob-ids-file>  my-repo.git

Note that the <blob-ids-file> file should contain Git object ids, rather plain SHA-1 hashes of the blob's contents.

For a given file, you can calculate the Git object id with git hash-object:

$ git hash-object README.md
a63b49c2e93788cd71c81015818307c7b70963bf

You can see that this value is different to a simple SHA-1 hash:

$ sha1sum README.md
7b833f7b37550e2df719b57e8c4994c93a865aa9  README.md

...that's because the Git object id hashes a Git header, along with the contents of the file, even though it does use the same SHA-1 algorithm.

The BFG is typically at least 10-50x faster than running git-filter-branch, and generally easier to use.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

like image 87
Roberto Tyley Avatar answered Jan 07 '23 15:01

Roberto Tyley