The resources below describe how to remove sensitive data from a git repository.
Afterward, how do I double-check that the naughty bits are really gone, i.e., search all blobs in the repository (be they referenced, garbage, packed, loose, or otherwise) to verify that the offending pattern has been utterly destroyed?
Does the answer change when working with a bare repository versus one with a work tree?
If you commit sensitive data, such as a password or SSH key into a Git repository, you can remove it from the history. To entirely remove unwanted files from a repository's history you can use either the git filter-repo tool or the BFG Repo-Cleaner open source tool.
Just run the rm command with the -f and -r switch to recursively remove the . git folder and all of the files and folders it contains. This Git repo remove command also allows you to delete the Git repo while allowing all of the other files and folder to remain untouched.
On GitHub.com, you can access your project history by selecting the commit button from the code tab on your project. Locally, you can use git log . The git log command enables you to display a list of all of the commits on your current branch. By default, the git log command presents a lot of information all at once.
According to that GitHub page, any commit may be referenced via SHA1, even if no ref points to it, so you must delete the repository and recreate it. I can verify that a commit is still visible at least two weeks after it has been dereferenced. In general, once you have removed the sensitive data — so that they are not accessible via any ref — the simplest way to prune Git’s object store is to clone the repository and destroy the old one. This is especially true if you do not have direct access to the repository such as on GitHub.
(In other words: If the garbage SHA1 is known, then GitHub will happily serve it over the web. The Git protocol will normally refuse to give you unnamed commits, but it can be enabled with the daemon.uploadarch
config.)
The way to turn referenced objects into garbage objects is with judicial application of rebase
, filter-branch
, reflog
, update-ref
and the like. The way to purge garbage objects is with judicial application of gc
, fsck
, prune
, and repack
.
Example queries:
List dangling commits, which you may grep for sensitive data that may be garbage collected:
git fsck --no-reflogs | awk '/dangling commit/{print $3}' | while read sha1;
do git grep foo $sha1; done
List every single object reachable from a ref (add --walk-reflogs
for reflogs instead):
git rev-list --objects --all | while read sha path;
do git show $sha | grep baz; done
Another way is to use fast-export
to export the entire repository into a text-based file, which you can pick through and manipulate with any tool you want, then fast-import
into a fresh repo. This is good because it doesn’t carry any garbage, and you can grep the whole archive very easily.
The answer does not change if you do not have a work tree, but commands like filter-branch
may want a work tree for some use cases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With