I needed to remove some Xcode files from an old repo that should have been ignored. So I ran the following command
git filter-branch --index-filter 'git rm -f --cached --ignore-unmatch *mode1v3 *pbxuser' HEAD
My understanding was that adding --cached would not affect the current working directory, but git deleted those matching files too. Luckily I had a backup(!) but I'm curious why it does this, or am I misunderstanding what --cached
does?
The Git rm –cached flag removes a file from the staging area. The files from the working directory will remain intact. This means that you'll still have a copy of the file locally. The file will be removed from the index tracking your Git project.
Lets you rewrite Git revision history by rewriting the branches mentioned in the <rev-list options>, applying custom filters on each revision. Those filters can modify each tree (e.g. removing a file or running a perl rewrite on all files) or information about each commit.
To entirely remove unwanted files from a repository's history you can use either the git filter-repo tool or the BFG Repo-Cleaner open source tool. The git filter-repo tool and the BFG Repo-Cleaner rewrite your repository's history, which changes the SHAs for existing commits that you alter and any dependent commits.
The git rm command can be used to remove individual files or a collection of files. The primary function of git rm is to remove tracked files from the Git index. Additionally, git rm can be used to remove files from both the staging index and the working directory.
The culprit is not the git rm
command. Its --cached
option works indeed as you say. You can easily try that in a small git repo.
Although the man page does not mention it, git filter-branch
does not seem to preserve your working area. Actually the command refuses to run if your working area is not clean, which is an indication already.
But even if the files are gone from the working area, they are not gone from the repo. They are just no longer in any commit reachable in your current branch. But filter-branch stores are reference to your branch before rewriting to reference name space refs/original/.
Use command git show-ref
to see it.
You could check out the old version to access your removed files. You could use command
git cat-file blob refs/original/refs/heads/master:foo
to get the contents of the file without checking out (use the reference shown by show-ref, foo is the name of the desired file). There are plenty of possibilities
You can use gitk --all
to navigate through both your rewritten and your current branches and you will see that nothing is really gone.
The behaviour of git-filter-branch
can be surprising, as you've discovered - and it won't protect you from unintended consequences when you run it.
Instead I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative specifically designed for deleting files from Git history. One way in which it makes your life easier here is that it will not delete, or change in any way, files in your latest commit.
You should follow the usage instructions - but the core bit is just this: download the BFG's jar (requires Java 6 or above) and run this command:
$ java -jar bfg.jar --delete-files *{mode1v3,pbxuser} my-repo.git
Any file matching that expression in your repository history - which isn't also in your latest commit - will be deleted. You can then use git gc
to clean away the dead data:
$ git gc --prune=now --aggressive
The BFG is generally much simpler to use than git-filter-branch
- the options are tailored around these two common use-cases:
Full disclosure: I'm the author of the BFG Repo-Cleaner.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With