Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove sensitive files and their commits from Git history

I would like to put a Git project on GitHub but it contains certain files with sensitive data (usernames and passwords, like /config/deploy.rb for capistrano).

I know I can add these filenames to .gitignore, but this would not remove their history within Git.

I also don't want to start over again by deleting the /.git directory.

Is there a way to remove all traces of a particular file in your Git history?

like image 505
Stefan Avatar asked May 16 '09 14:05

Stefan


People also ask

Can I remove git commit from history?

Git doesn't have a modify-history tool, but you can use the rebase tool to rebase a series of commits into the HEAD. With the interactive tool, you can remove a commit that you want.

How do I clean up commit history?

It's cleanup time ⏰ If you have been lazily writing multiple vague commits, you can use git reset --soft <old-commit> to make your branch point to that old commit. And as we learned, Git will start by moving the branch pointer to it and stops right there. It won't modify the index or working directory.

How do I completely delete a file from git history?

The easiest way to delete a file in your Git repository is to execute the “git rm” command and to specify the file to be deleted. Note that by using the “git rm” command, the file will also be deleted from the filesystem.

What file can be used to ensure sensitive files are not included in git commits?

Remove the sensitive code / file. rm -rf . git/ # Remove all git info from your code.


2 Answers

For all practical purposes, the first thing you should be worried about is CHANGING YOUR PASSWORDS! It's not clear from your question whether your git repository is entirely local or whether you have a remote repository elsewhere yet; if it is remote and not secured from others you have a problem. If anyone has cloned that repository before you fix this, they'll have a copy of your passwords on their local machine, and there's no way you can force them to update to your "fixed" version with it gone from history. The only safe thing you can do is change your password to something else everywhere you've used it.


With that out of the way, here's how to fix it. GitHub answered exactly that question as an FAQ:

Note for Windows users: use double quotes (") instead of singles in this command

git filter-branch --index-filter \ 'git update-index --remove PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA' <introduction-revision-sha1>..HEAD git push --force --verbose --dry-run git push --force 

Update 2019:

This is the current code from the FAQ:

  git filter-branch --force --index-filter \   "git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA" \   --prune-empty --tag-name-filter cat -- --all   git push --force --verbose --dry-run   git push --force 

Keep in mind that once you've pushed this code to a remote repository like GitHub and others have cloned that remote repository, you're now in a situation where you're rewriting history. When others try pull down your latest changes after this, they'll get a message indicating that the changes can't be applied because it's not a fast-forward.

To fix this, they'll have to either delete their existing repository and re-clone it, or follow the instructions under "RECOVERING FROM UPSTREAM REBASE" in the git-rebase manpage.

Tip: Execute git rebase --interactive


In the future, if you accidentally commit some changes with sensitive information but you notice before pushing to a remote repository, there are some easier fixes. If you last commit is the one to add the sensitive information, you can simply remove the sensitive information, then run:

git commit -a --amend 

That will amend the previous commit with any new changes you've made, including entire file removals done with a git rm. If the changes are further back in history but still not pushed to a remote repository, you can do an interactive rebase:

git rebase -i origin/master 

That opens an editor with the commits you've made since your last common ancestor with the remote repository. Change "pick" to "edit" on any lines representing a commit with sensitive information, and save and quit. Git will walk through the changes, and leave you at a spot where you can:

$EDITOR file-to-fix git commit -a --amend git rebase --continue 

For each change with sensitive information. Eventually, you'll end up back on your branch, and you can safely push the new changes.

like image 59
natacado Avatar answered Oct 12 '22 22:10

natacado


Changing your passwords is a good idea, but for the process of removing password's from your repo's history, I recommend the BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch explicitly designed for removing private data from Git repos.

Create a private.txt file listing the passwords, etc, that you want to remove (one entry per line) and then run this command:

$ java -jar bfg.jar  --replace-text private.txt  my-repo.git 

All files under a threshold size (1MB by default) in your repo's history will be scanned, and any matching string (that isn't in your latest commit) will be replaced with the string "***REMOVED***". You can then use git gc to clean away the dead data:

$ git gc --prune=now --aggressive 

The BFG is typically 10-50x faster than running git-filter-branch and the options are simplified and tailored around these two common use-cases:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

Full disclosure: I'm the author of the BFG Repo-Cleaner.

like image 22
Roberto Tyley Avatar answered Oct 12 '22 23:10

Roberto Tyley