Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between git filter branch and git subtree?

Was searching throw SO for an answer to this. Came across this older thread which didn't seem to give any answers. Retriggering this thread hoping someone may know!

Can someone tell me the difference b/w git subtree and git filter-branch? I'll use the same example in the original question for this:

git subtree split --prefix=some_subdir -b some_branch

git filter-branch --subdirectory-filter some_subdir some_branch
like image 251
Dorian McAllister Avatar asked Aug 03 '16 05:08

Dorian McAllister


People also ask

What is filter branch in git?

DESCRIPTION. Lets you rewrite Git revision history by rewriting the branches mentioned in the <rev-list options>, applying custom filters on each revision. Those filters can modify each tree (e.g. removing a file or running a perl rewrite on all files) or information about each commit.

What is git subtree?

git subtree lets you nest one repository inside another as a sub-directory. It is one of several ways Git projects can manage project dependencies. Why you may want to consider git subtree. Management of a simple workflow is easy.

What is git subtree split?

Use git subtree split to extract the files you want to the an intermediate branch in your repository (you have already done this). git subtree split -P lib3 -b new-branch.


2 Answers

2016: Yes, git subtree (a contrib/ shell) can be used to split repos, as described in "Using Git subtrees for repository separation" by Stu Campbell.

You need to remove the code that you have duplicated in your split folder, though (see also theamk's answer):

git subtree split --prefix=path/to/code -b split
git push ~/shared/ split:master
git rm -r path/to/code
git commit -am "Remove split code."

That differs from git filter-branch (a native Git command) which rewrites the repo history, picking up only those commits that actually affect the content of a specific subdirectory.

Meaning: there is no code to git rm once the filter-branch has been run.
git filter-branch does not duplicate commits like git subtree split does: it deletes ("filters out") everything that does not match a certain criterion (here a subfolder path).
Again, see theamk's answer for updates: there is no duplication when using a new branch: git subtree split --prefix=some_subdir -b some_branch.


Update 2021:

  • Do use git switch some_branch or git switch -c some_branch, instead of the old and confusing git checkout command.

  • Do consider the new and improved git filter-repo, since git filter-branch and BFG are officially obsolete.
    (See git filter-branch man page)

git filter-repo can extract wanted paths and their history (stripping everything else)

 git switch -c some_branch
 git filter-repo --path some_subdir/ --refs some_branch
like image 188
VonC Avatar answered Sep 24 '22 20:09

VonC


When executed as written, the differences are pretty minor:

  • your "subtree split" command will start from HEAD and put result to some_branch, which must not exist before
  • your "filter-branch" command will start with some_branch and put result back to some_branch, overriding some_branch with the new content.
  • In my tests, "git filter-branch" was ~50x faster (on a very old repo with only a few commits touching the selected path)

In other words, the two snippets below are exactly equivalent, as long as special subtree rejoin commits are not found.

git subtree split --prefix=some_subdir -b some_branch
git checkout some_branch

and

git checkout -b some_branch
git filter-branch --subdirectory-filter some_subdir some_branch

why bother with "git subtree" then, you may ask? For --rejoin and --onto options -- they support a very specific workflow which original author was using.

like image 33
theamk Avatar answered Sep 22 '22 20:09

theamk