Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

git merge branches with different directory structures

Tags:

git

merge

I am somewhat new to git, I've been using it for a number of months, and Im comfortable doing most of the basic tasks. So... I think its time to take on some more complicated tasks. At my work, we have a few people working on older code to update it, this involves actual code work and updating the directory structure to be more modular. My question is can these two things be done in parallel branches and then merged or rebased. My intuition says no, because dir restructure is a rename, and git renames by adding a new file and deleting the old (least this is how i understand it). But I wanted to be sure.
Here's the scenario: parent-branch looks like:

├── a.txt ├── b.txt ├── c.txt 

then we branch two say, branchA and branchB. In branchB we modify the structure:

├── lib │   ├── a.txt │   └── b.txt └── test     └── c.txt 

Then in branchA we update a,b, and c.

Is there someway to merge the changes done in branchA with the new structure in branchB? rebase comes to mind, however, I don't think lib/a.txt is actually connected to a.txt after a git mv...

Jameson

like image 403
jmerkow Avatar asked Mar 22 '14 19:03

jmerkow


People also ask

Can we merge two branches in git?

Git merge will combine multiple sequences of commits into one unified history. In the most frequent use cases, git merge is used to combine two branches.


1 Answers

First, a short note: you can always try a merge, then back it out, to see what it does:

$ git checkout master Switched to branch 'master' $ git status 

(make sure it's clean—backing out of a failed merge when there's changes is not fun)

$ git merge feature 

If the merge fails:

$ git merge --abort 

If the automatic merge succeeds, but you don't want to keep it just yet:

$ git reset --hard HEAD^ 

(Remember that HEAD^ is the first parent of the current commit, and the first parent of a merge is "what was there before the merge". Thus, if the merge worked, HEAD^ is the commit just before the merge.)


Here's a simple recipe for finding out what renames git merge will automatically detect.

  1. Make sure diff.renamelimit1 is 0 and diff.renames is true:

    $ git config --get diff.renamelimit 0 $ git config --get diff.renames true 

    If these are not already set this way, set them. (This affects the diff step below.)

  2. Choose which branch you're merging-into, and which you're merging-from. That is, you are going to do something like git checkout master; git merge feature soon; we need to know the two names here. Find the merge base between them:

    $ into=master from=feature $ base=$(git merge-base $into $from); echo $base 

    You should see some 40-character SHA-1, like ae47361... or whatever here. (Feel free to type out master and feature instead of $into and $from everywhere here. I am using the variables so that this is a "recipe" instead of an "example".)

  3. Compare the merge base against both $into and $from to see which files are detected as "renames":

    $ git diff --name-status $base $into R100    fileB   fileB.renamed $ git diff --name-status $base $from R100    fileC   fileD 

(You might want to run these diffs with the output saved to two files, and then peruse the files later. Side note: you can get the effect of the third diff with special syntax, master...feature: the three dots here mean "find the merge base".)

The two output sections have a list of files Added, Deleted, Modified, Renamed, and so on (this example has just the two renames, with 100% matches).

Since $into is master, the first list is what git thinks has already happened in master. (These are the changes git "wants to keep", when you merge-in feature.)

Meanwhile, $from is feature, so the second list is what git thinks happened in feature. (These are the changes git wants to "now add to master", when you do the merge.)

At this point, you have to do a bunch of work:

  • Files marked R, git will detect as renamed.
  • If the two R lists are the same in both branches, you may be all good (but read on anyway). If there are Rs in the first list that are not in the second ... well, see below.
  • When you run git checkout master; git merge feature (or git checkout $into; git merge $from) git will do the renames shown in the second list, in order to "add those changes" to master.
  • In any case, compare this with the files you want git to detect as renamed. Look for D and A entries that you wanted to have show up as R entries: these occur when, in one of the branches, you not only renamed the file, but also changed the contents so much that git no longer detects the rename.

If the second list does not show everything you want to see, you're going to have to help git out. See even longer description below.

If the first list has a rename that's not in the second, this may be entirely harmless, or it may cause an "unnecessary" merge conflict and a missed chance for a real merge. Git is going to assume that you intend to keep this rename, and also look at what happened in the merge-from branch ($from, or feature in this case). If the original file was modified there, git will attempt to bring the changes from there into the renamed file. That is probably what you want. If the original file was not modified there, git has nothing to bring in and will leave the file alone. That's also probably what you want. The "bad" case is, again, an undetected rename: git thinks the original file was deleted in branch feature, and a new file with some other name was created.

In this "bad" case, git will give you a merge conflict. For instance, it might say:

CONFLICT (rename/delete): newname deleted in feature and renamed in HEAD. Version HEAD of newname left in tree. Automatic merge failed; fix conflicts and then commit the result. 

The problem here is not that git has retained the file under its new name in master (we probalby want that); it's that git may have missed the chance to merge the changes made in branch feature.

Worse—and this might be classifiable as a bug—if the new name occurs in the merge-from branch feature, but git thinks it's a new file there, git leaves us with only the merge-into version of the file in the work tree. The message emitted is the same. Here, I made a few more changes in master to rename fileB to fileE, and on feature, made sure that git would not detect the change as a rename:

$ git diff --name-status $base master R100    fileB   fileE $ git diff --name-status $base feature D       fileB R100    fileC   fileD A       fileE $ git checkout master; git merge feature CONFLICT (rename/delete): fileE deleted in feature and renamed in HEAD. Version HEAD of fileE left in tree. Automatic merge failed; fix conflicts and then commit the result. 

Note the potentially misleading message, fileE deleted in feature. Git is printing the new name (the master version of the name); that's the name it believes you "want" to see. But it is file fileB that was "deleted" in feature, replaced by an entirely new fileE.

(git-imerge, mentioned below, may be able to handle this particular case.)


1There's also a merge.renameLimit (spelled with lowercase limit in the source, but these configuration variables are case-insensitive) that you can set separately. Setting these to 0 tells git to use "a suitable default", which has changed over the years as CPUs have gotten faster. If a separate merge rename limit is not set, git uses the diff rename limit, and again a suitable default if that's not set or is 0. If you set them differently, merge and diff will detect renames in different cases, though.

You can also now set the "rename threshold" in a recursive merge with -Xrename-threshold=, e.g., -Xrename-threshold=50%. The usage here is the same as for git diff's -M option. This option first appeared in git 1.7.4.


Let's say you are on branch master, and you do git merge 12345467 or git merge otherbranch. Here's what git does:

  1. Find the merge-base: git merge-base master 1234567 or git merge-base master otherbranch.

    This yields a commit-ID. Let's call that ID B, for "Base". Git now has three specific commit IDs: B, the merge base; the commit ID of the tip of the current branch master; and the commit ID you gave it, 1234567 or the tip of branch otherbranch. Let's just draw these in terms of the commit graph, for completeness; let's say it looks like this:

    A - B - C - D - E       <-- master       \         F - G - H - I   <-- otherbranch 

    If all goes well, git will produce a merge commit that has E and I as its two parents, but we want to concentrate here on the resulting work tree rather than the commit graph.

  2. Given these three commits (B E and I), git computes two diffs, a la git diff:

    git diff B E git diff B I 

    The first is the set of changes made on branch, and the second is the set of changes made on otherbranch, in this case.

    If you run git diff manually, you can set the "similarity threshold" for rename detection with -M (see above for setting it during merge). Git's default merge sets automatic rename detection to 50%, which is what you get with no -M option and diff.renames set to true.

If the files are "sufficiently similar" (and "exactly the same" is always sufficient), git will detect renames:

    $ git diff B otherbranch  # I tagged the merge-base `B`     diff --git a/fileB b/fileB.txt     similarity index 71%     rename from fileB     rename to fileB.txt     index cfe0655..478b6c5 100644     --- a/fileB     +++ b/fileB.txt     @@ -1,3 +1,4 @@      file B contains      several lines of      stuff.     +changeandrename 

(In this case I just renamed from fileB to fileB.txt but the detection works across directories too.) Let's note that this is conveniently represented by git diff --name-status output:

    $ git diff --name-status B otherbranch     R071    fileB   fileB.txt 

(I should also note here that I have diff.renames set to true and diff.renamelimit = 0 in my global git config.)

  1. Git now attempts to combine the changes from B to I (on otherbranch) into the changes from B to E (on branch).

If git is able to detect that lib/a.txt is renamed from a.txt, it will connect them. (And you can preview whether it will by doing a git diff.) In this case the automatic merge result is likely to be what you want, or sufficiently close.

If not, though, it won't.

When the automatic rename detection fails, there's a way to break up commits (or maybe they are already sufficiently broken-up) step-wise. For instance, suppose in the sequence of F G H I commits, one step (maybe G) simply renames a.txt to lib/a.txt, and other steps (F, H, and/or I) make so many other changes to a.txt (under whatever name) to fool git into not realizing that the file was renamed. What you can do here is increase the number of merges, so that git can "see" the rename. Let's assume for simplicity that F does not change a.txt and G renames it, so that the diff from B to G shows the rename. What we can do is first merge commit G:

git checkout master; git merge otherbranch~2 

Once this merge is complete and git has renamed from a.txt to lib/a.txt in the tree for the new merge commit on branch branch, we do a second merge to bring in commits H and I:

git merge otherbranch 

This two-step merge causes git to "do the right thing".

In the most extreme case, an incremental, commit-by-commit merge sequence (which would be extremely painful to do manually) will pick up everything that could be picked up. Fortunately someone has already written this "incremental merge" program for you: git-imerge. I have not tried this but it's the Obvious Answer for hard cases.

like image 96
torek Avatar answered Oct 16 '22 07:10

torek