Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I verify removal of sensitive data from a git repository?

Tags:

git

The resources below describe how to remove sensitive data from a git repository.

  • How do I remove sensitive files from git’s history?
  • GitHub Help: Removing sensitive data

Afterward, how do I double-check that the naughty bits are really gone, i.e., search all blobs in the repository (be they referenced, garbage, packed, loose, or otherwise) to verify that the offending pattern has been utterly destroyed?

Does the answer change when working with a bare repository versus one with a work tree?

like image 345
Greg Bacon Avatar asked Mar 14 '11 18:03

Greg Bacon


People also ask

How do I remove sensitive data from git history?

If you commit sensitive data, such as a password or SSH key into a Git repository, you can remove it from the history. To entirely remove unwanted files from a repository's history you can use either the git filter-repo tool or the BFG Repo-Cleaner open source tool.

How do I remove data from a git repository?

Just run the rm command with the -f and -r switch to recursively remove the . git folder and all of the files and folders it contains. This Git repo remove command also allows you to delete the Git repo while allowing all of the other files and folder to remain untouched.

How do I see my git history?

On GitHub.com, you can access your project history by selecting the commit button from the code tab on your project. Locally, you can use git log . The git log command enables you to display a list of all of the commits on your current branch. By default, the git log command presents a lot of information all at once.


1 Answers

According to that GitHub page, any commit may be referenced via SHA1, even if no ref points to it, so you must delete the repository and recreate it. I can verify that a commit is still visible at least two weeks after it has been dereferenced. In general, once you have removed the sensitive data — so that they are not accessible via any ref — the simplest way to prune Git’s object store is to clone the repository and destroy the old one. This is especially true if you do not have direct access to the repository such as on GitHub.

(In other words: If the garbage SHA1 is known, then GitHub will happily serve it over the web. The Git protocol will normally refuse to give you unnamed commits, but it can be enabled with the daemon.uploadarch config.)

The way to turn referenced objects into garbage objects is with judicial application of rebase, filter-branch, reflog, update-ref and the like. The way to purge garbage objects is with judicial application of gc, fsck, prune, and repack.

Example queries:

  • List dangling commits, which you may grep for sensitive data that may be garbage collected:

    git fsck --no-reflogs | awk '/dangling commit/{print $3}' | while read sha1;
      do git grep foo $sha1; done
    
  • List every single object reachable from a ref (add --walk-reflogs for reflogs instead):

    git rev-list --objects --all | while read sha path;
      do git show $sha | grep baz; done
    

Another way is to use fast-export to export the entire repository into a text-based file, which you can pick through and manipulate with any tool you want, then fast-import into a fresh repo. This is good because it doesn’t carry any garbage, and you can grep the whole archive very easily.

The answer does not change if you do not have a work tree, but commands like filter-branch may want a work tree for some use cases.

like image 115
Josh Lee Avatar answered Sep 27 '22 19:09

Josh Lee