Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing private information from old Git commits

I have a project versioned with Git that I'd like to make open source, but it has some private information in it that is specific to the environment in which it was originally used. I'm going to change the information in question to load from a config file which is not included in the repository. I realize I should have done this in the first place, but since the private information still exists in previous commits, how can I go about removing it from my history? Do I just have to start a new repository based on the latest commit and lose all my history or is there a way to salvage the current repository while removing any record of the private information?

Edit: To clarify, I don't want to completely remove the files that contain this private information, because they are still used. Rather, I want to remove/blank out/change the occurrence of certain strings within them.

like image 961
Jimmy Avatar asked Feb 08 '10 22:02

Jimmy


People also ask

How do I delete old commit messages?

The easiest way to undo the last Git commit is to execute the “git reset” command with the “–soft” option that will preserve changes done to your files. You have to specify the commit to undo which is “HEAD~1” in this case. The last commit will be removed from your Git history.

How do I remove sensitive data from git history?

If you commit sensitive data, such as a password or SSH key into a Git repository, you can remove it from the history. To entirely remove unwanted files from a repository's history you can use either the git filter-repo tool or the BFG Repo-Cleaner open source tool.


2 Answers

I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch specifically designed for removing private data from Git repos.

The usage instructions give the steps in more detail, but the core bit is just: download the BFG's jar (needs Java 6 or above) and run this command:

$ java -jar bfg.jar  --replace-text replacements.txt  my-repo.git

The replacements.txt file should contain all the substitutions you want to do, in a format like this (one entry per line - note the comments shouldn't be included):

PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default)
PASSWORD2==>examplePass         # replace with 'examplePass' instead
PASSWORD3==>                    # replace with the empty string
regex:password=\w+==>password=  # Replace, using a regex

Your entire repository history will be scanned, and all non-binary files (under 1MB in size) will have the substitutions performed: any matching string (that isn't in your latest commit) will be replaced.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

like image 172
Roberto Tyley Avatar answered Sep 19 '22 16:09

Roberto Tyley


I wrote a script for this a little while ago. You can find it here: https://gist.github.com/dound/76ea685c05c4a7895247457eb676fe69

(original writeup viewable from archive.org: https://web.archive.org/web/20160208235904/http://dound.com:80/2009/04/git-forever-remove-files-or-folders-from-history/)

The script builds on the git-filter-branch tool which comes with git. If you're curious, you can read more about removing files from a git repo here, but using the script from the link above should be easy and all you really need to accomplish removing that private information.

like image 23
David Underhill Avatar answered Sep 22 '22 16:09

David Underhill