Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I merge changes in Git in files that I moved?

Tags:

git

merge

egit

I moved some directories.

When I merge, there are many conflicting files, since other developers have committed their changes. Both egit Merge Tool and git mergetool say that the file was deleted locally or remotely. See image.

How do I merge these changes?

enter image description here

like image 462
Joshua Fox Avatar asked May 01 '17 09:05

Joshua Fox


People also ask

How do I merge local changes?

To merge branches locally, use git checkout to switch to the branch you want to merge into. This branch is typically the main branch. Next, use git merge and specify the name of the other branch to bring into this branch. This example merges the jeff/feature1 branch into the main branch.

How do I merge changes without committing?

OPTIONS. Perform the merge and commit the result. This option can be used to override --no-commit. With --no-commit perform the merge and stop just before creating a merge commit, to give the user a chance to inspect and further tweak the merge result before committing.


1 Answers

File history and rename detection

You never really need to worry about "preserving history" in Git. Git does not have file history at all, it has only commit history. That is, each commit "points to" (contains the hash ID of) its parent—or, for a merge, both its parents—and this is the history: commit E is preceded by commit D, while commit D is preceded by commit C, and so on. As long as you have the commits, you have the history.

That said, Git can try to synthesize the history of one specific file, using git log --follow. You specify a starting commit and a path name, and Git checks, commit-by-commit, to see if the file was renamed when comparing the current commit's parent to the current commit. This uses Git's rename detection to identify that file a/b.txt in commit L (left) is "the same file" as file c/d.txt in commit R (right).

Rename detection has a lot of fiddly knobs, but at the base level, it's basically this:

  • Git looks at all the file names in commit L.
  • Git looks at all the file names in commit R.
  • If there's a file name that vanishes from L and appears in R, such as a/b.txt is gone and c/d.txt is all-new, why, that's a candidate for a detected rename.
  • Now that there are candidates (unpaired L files and unpaired R files), Git compares the contents of these unpaired files.

Unpaired files go into a pairing queue (one for L, one for R), and Git hashes the contents of all the files. It already has the internal Git hash so it compares all those directly, first. If a file is completely unchanged, it has the same Git hash ID (but different names) in L and R, and can be immediately paired-up and removed from the pairing queues.

Now that exact-matches are taken out, Git tries the long slow slog. It takes one unpaired L file, and computes a "similarity index" for every R file. If some R file is sufficiently similar—or several are—it takes the "most similar" R file and pairs it with the L file. If no file is sufficiently similar, the L file remains unpaired (is taken out of the queue) and is considered "deleted from L". Eventually there are no files in the unpaired L queue, and whatever files remain in the unpaired R queue, those files are "added" (new in R). Meanwhile, all paired-up files have been renamed.

What this means is: When comparing (git diff) commit L to R, if two files are sufficiently similar, they get paired up as a rename. The default similarity index is 50%, so the files need to be a 50% match (whatever that means—the similarity index computation is somewhat opaque), but an exact match is much easier and faster for Git.

Note that git log --follow enables rename detection (on just one target R file, as we're working backwards through the log, comparing the parent commit to just the one file whose name we know in the child). Since Git version 2.9, both git diff and git log -p now have rename detection turned on automatically. In older versions, you had to use the -M option to set the similarity threshold, or configure diff.renames to true, to get git diff and git log -p to do rename detection.

There is also a maximum length for the pairing queues. This has been doubled twice, once in Git 1.5.6 and once in Git 1.7.5. You can control it yourself: it is configurable as diff.renameLimit and merge.renameLimit. The current limits are 400 and 1000. (If you set these to zero, Git uses its own internal maximum, which can chew up enormous amounts of CPU time—that's why these two limits exist in the first place. If you set diff.renameLimit but not merge.renameLimit, git merge uses your diff setting.)

This leads to a rule of thumb that applies to git log --follow: If possible, when you intend to rename some file or set of files, commit the rename step by itself, without changing any of the file contents. If possible, keep the number of renamed files fairly small: at or below 400, for instance. You can commit more renames in multiple steps, 400 at a time. But remember that you're trading off git log --follow ability and speed against cluttering up your history with pointless commits: if you need to rename 50000 files, maybe you should just do it.

But how does this affect merging? Well, git merge, like git log --follow, does always turn on rename detection. But which commit is L and which commit or commits are R?

Merging and rename detection

Whenever you run:

git merge <commit-specifier>

Git has to find the merge base between your current (HEAD) commit and the specified other commit. (Usually this is just git merge <branchname>. That selects the tip commit of that other branch by resolving the branch name to the commit to which it points. By the definition of "branch name" in Git, that's the tip commit of that branch, so that this "just works". But you can specify any commit by hash ID, for instance.) Let's call this merge base commit B (for base). We already know that our own commit is HEAD, though some things call this "local". Let's call the other commit O (for other), though some things call this "remote" (which is silly: nothing in Git is remote!).

Git then does, in effect, two git diffs. One compares B vs HEAD, so for this particular diff, L is B and R is HEAD. Git will detect, or fail to detect, renames according to the rules we saw above. Then Git does the other git diff, which compares B to O. Git will detect or fail to detect renames according to the same rules yet again.

If some file is renamed in B-vs-HEAD, Git diffs its contents as usual. If some file is renamed in B-vs-O, Git diffs its contents as usual. If a single B file F is renamed to two different names in HEAD and O, Git declares a rename/rename conflict on that file, and leaves both names in the work-tree for you to clean up. If it's renamed in only one diff—it's still called F in either HEAD or O—then Git stores the file in the work-tree using the new name from whichever side renamed it. In any case, Git tries to combine the two sets of changes (from B-vs-HEAD and B-vs-O) as usual.1

Of course, for Git to detect the rename, the contents of the file must be sufficiently similar, as always. This is particularly problematic for Java files (and sometimes Python as well), where the file names become embedded in import statements. If a module consists mostly of import statements, with just a few lines of code of their own, the rename-induced changes will overwhelm the remaining file contents, and the files will not be even a 50% match.

There is a solution, though it is a bit ugly. As with the rule of thumb for git log --follow, we can commit just the renames first, and then commit the content-changing "fix all the imports" as a separate commit. Then, when we go to merge, we can do two or even three merges:

git checkout ...  # whatever branch we plan to merge into
git merge <hash>  # merge with everything just before the Great Renaming

Since no files are renamed, this merge will go as well, or as poorly, as usual. Here's the result, in graph form. Note that the hash we supplied to the git merge command was the hash of commit A, just before R that does all the renames:

...--*--o--...--o--M    <-- mainline
      \           /
       o--o--...-A--R--...--o   <-- develop, with renames at R

Then:

git merge <hash of R>

Since every file's content is completely identical, name-wise, across the other R commit—the merge base is commit A—the effect here is merely to pick up all the renames. We keep the file contents from HEAD commit M, but the names from R. This merge should succeed automatically:

...--*--o--...--o--M--N    <-- mainline
      \           /  /
       o--o--...-A--R--...--o   <-- develop, with renames at R

and now we can git merge develop to proceed to merge the development branch.

In many cases, we won't need to make merge M, but it might not be a bad idea to do it anyway if we need to make merge N just for all the renames. The reason is that commit R is not functional: it has the wrong names for imports. Commit R must be skipped during bisection. This means that merge N is similarly non-functional and must be skipped during bisection. It might be good to have M present, since M could actually work.

Note that if you do any of this, you are distorting / contorting your source code just to please your version control system. This is not a good situation. It may be less bad than your other alternatives, but don't tell yourself it's good.


1I still need to see what happens to the two copies of the file when there is a rename/rename conflict. Since Git leaves both names in the work-tree, do both names contain the same merged contents, plus any conflict markers if needed? That is, if the file was named base.txt and is now named head.txt and other.txt, do the work-tree versions of head.txt and other.txt always match?

like image 87
torek Avatar answered Oct 05 '22 01:10

torek