Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to remove a file from history using interactive git rebase?

There's an old commit in my local repository which added some files, including one called "unwanted.txt". In subsequent commits, that file has been modified, along with others. Is it possible to completely remove the file "unwanted.txt" from history using interactive git rebase? I know it's possible to achieve this using "git filter-branch", but since I am learning git and I want to understand the full potential of "git rebase -i", I wonder if this command can be used for such an operation.

like image 244
user1945293 Avatar asked Sep 08 '15 09:09

user1945293


People also ask

How to rebase a file in Git?

1 remove the file and rewrite history from the commit you done with the removed file (this will create new commit hash... 2 now force push the repo:#N#git push origin --force --all 3 now tell your collaborators to rebase. More ...

How to remove a file from git history?

Run the following command to remove file from git history. Replace path_to_file with the path to file that you want to remove. If the above command does not work for you, then you can try the following.

Can I use interactive REBASE on a commit that has already been shared?

In contrast, it should NOT be used on commit history that has already been pushed and shared on a remote repository. Interactive rebase is one of those tools that "rewrite" Git history – and you shouldn't do this on commits that have already been shared with others.

What are the advantages of using Git REBASE interactive mode?

It helps you to find the bug faster. Git Rebase Interactive allows you to split commits. As stated in the docs: In interactive mode, you can mark commits with the action edit. However, this does not necessarily mean that git rebase expects the result of this edit to be exactly one commit.


2 Answers

You should be able to do this by editing the offending commit (that's e or edit in front of the commit in the rebase todo list,) and then just remove the file like this:

git rm unwanted.txt
git commit --amend
git rebase --continue

This will likely give you conflicts in later commits that change the file, but that should be trivially solved by removing the file again and continue the rebase.

Edit: You will most likely also have to make sure that no branches point to any commits where the unwanted file still exist, and run git gc to flush out unreferences blobs in the repo. This shouldn't be a problem if it a purely private repo not shared with anyone else.

like image 161
harald Avatar answered Oct 18 '22 11:10

harald


It's possible in theory, but in practice it's usually much too painful.

The method is the same in both rebase and filter-branch. It may help if you realize that all that an interactive rebase is, is git cherry-pick on steroids, as it were; and git filter-branch is simply an automated extra-complicated rebase across multiple branches and with merge preservation.

As usual with git, it mostly boils down to manipulating the commit graph, and adding new commits that look like existing commits but with something changed—in this case, the trees attached to those commits. (And as soon as one commit is different, it gets a different SHA-1, which means all subsequent commits must change as well, to list the different SHA-1s that pop into existence as the new graph grows.)

To see how it works, start by drawing the commit graph. You'll need a fairly complete graph depending on how far back you have to go to stop seeing the unwanted.txt file. But I'll just draw a simple graph, with just one named branch, master:

I - A - B - C - F   <-- master
      \       /
        D - E

Here I is the initial commit; for simplicity let's say it does not have the unwanted file. Let's say instead that this file was introduced in commit A and modified in C and E.

What we need to do is this:

  1. Copy all of commit I (preserving commit author and committer, and date stamps, and so on) while removing the unwanted file, i.e., altering the source tree attached to I if needed. This just gives us commit I back so we retain its original SHA-1.
  2. Copy all of commit A while removing the unwanted file. This results in a new, different commit A' because we change A's tree to a new tree that has the file removed. We get a new SHA-1 cryptographic checksum because the new commit is different from the old. So we save an entry in a map that says "old commit A replaced by new commit A'.
  3. Copy all of commit B while removing the unwanted file. This changes the tree (remember, each commit has a complete snapshot of the entire source, so the unwanted file is in the original B). Make a new commit B' that has the altered tree and has commit A' as its parent ID.
  4. Copy all of commit C while removing the unwanted file, resulting in C'.
  5. Copy all of commit D with our changes, resulting in D'. (Note that we cannot copy F until we've copied all its predecessors in the graph, in this case C and E.)
  6. Copy all of commit E with our changes.
  7. Copy all of commit F with our changes. The new commit F' has C' and E' as its two parents; we find these using the SHA-1 mapping that we've been constructing all along.
  8. Last, change master to point to commit F', abandoning the original commit F.

This results in a graph that looks like this:

    A - B - C - F    [abandoned]
   /  \       /
  /     D - E
 /
I - A' - B' - C' - F'   <-- master
       \         /
         D' - E'

An interactive rebase with --preserve-merges can handle this particular case. If there's more than one branch, though, you have to carefully rebase the additional branches with --onto as needed to make use of the new commits, which you have to match up with the old commits, most likely using an SHA-1 map file that you construct manually as you go.

There's an additional wrinkle, which is that git commit by default refuses to make "empty" commits, where "empty" is defined as "has the same tree as the previous commit" (and is not a merge). The filter-branch script handles this automatically for you, mapping multiple new commits to a single old commit if you choose to delete empty commits (a commit that only modifies the unwanted file becomes empty when the previous and new commits both give up the unwanted file). An interactive rebase does not handle this very well when preserving merges, so that imposes even more pain.

There are some other subtle differences: for instance, when rebase "abandons" a chain of commits, they remain in the "reflog" for the branch that has been rebased, as well as in the reflog for HEAD. The filter-branch script uses a different method: it copies all the references to a sub-name-space, refs/original/. This all matters when you get to the point of wanting to purge the old, abandoned commits: with rebase, you "expire" old references, but with filter-branch, you forcibly remove the originals instead.

like image 30
torek Avatar answered Oct 18 '22 12:10

torek