Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git and file renaming and replacing

Tags:

git

rename

I don't generally have a problem with renaming with git, but I've run across a really difficult problem I'm trying to work out.

For various reasons, I have a situation where we have a file dir1/file. Due to some long ago decisions, it's in completely the wrong place and needs to be moved to dir2/file.

However there's a lot of code that needs to be changed and for various reasons we have to keep the file in the new place and the old place for a while.

So, the natural(ish) approach would be to do this:

git mv dir1/file dir2/file
git commit -a

so far so good:

> git diff master --name-status --find-renames
R100 dir1/file dir2/file

So then we do

ln -s ../dir2/file dir1/file
git commit -a

but this happens

> git diff master --name-status --find-renames
A    dir2/file
T    dir1/file

And if anyone changes dir1/file on master and I try to pull it I get told there's a merge conflict with dir1/file1 and dir2/file1 is left unchanged. I thought from reading other posts that git tracked content, but it seems to be tracking filenames as well as content. And completely missing the fact that the content has moved.

So how on earth do I get git to recognise that I have in fact renamed a file and then added a new file which just happens to have the same name as the old one?

Note: I'd rather not do this as multiple pushes. There's several files like this that are affected and the chances that someone is doing changes to one of them in parallel are quite high and there's no guarantee they will be able to do the pull to get the rename and then the pull to get the soft link.

Addition example. I was removing a function from a python module __init__.py which should never have been in there, the __init__.py should have been empty. This too is not spotted as a rename. Even though the contents of the new file are 99% identical to the original __init__.py and the contents of the new __init__.py are 0% identical to the old contents. Everything is fine till I add a file with the same name.

like image 657
Tom Tanner Avatar asked Feb 13 '23 01:02

Tom Tanner


1 Answers

Git does, in fact, track content rather than—or rather, we should say "in addition to"—names. The diff goes wrong because git diff (necessarily) tries to map names and compare the contents of two separate commits (or one commit and the current working directory, or one commit and the current index, etc., but these are just variations on the theme of "compare two commits").

More specifically, when git diff compares trees1T1 and T2, it assumes by default that the only candidates for a rename are those where some file-name exists in T1 but not in T2, and some other (different) file name exists in T2 but not in T1.

Thus, when you make the first commit, you have two commits—let's call these A and B—with two trees where dir1/file1 "goes missing" from A and dir2/file2 appears in B. That's a candidate for rename-detection, and because the file contents are 100% identical, git easily spots the rename and gives you the R100 diff output.

When you make the second commit, you add commit C with a third tree. Comparing B and C works fine: dir2/file appears in both, and the new symlink dir1/file appears only in C, and the diff output from this pair is fine too. The problem comes in when comparing A and C: now dir1/file1 appears in both, while dir2/file2 is only in C, and git diff does not realize that there's a rename candidate.

There is a flag, --find-copies-harder—or you may specify -C more than once—that (rather unsurprisingly) makes the copy/rename detection code work harder. In this case git will consider the possibility that a file that "appears unchanged" (has the same name in both trees) might have been copied or renamed to another file that "appears new" (exists in second tree but not in first). This is not enabled by default because the fully-general version is extremely computationally-intensive.


Unfortunately, there is no way to control the diff options used when computing diff-sets for git merge. The merge command sets some defaults (-M50%, etc.) and does several diffs, and does not let you set --find-copies-harder. So even if this works for a manual git diff, it won't solve your merge conflict.

Note that when you do a merge,2 git computes just two sets of diffs: that from the merge-base3 to the current HEAD, and that from the merge-base to the merged-in commit (git merges a commit, not a branch: the fact that the result merges that branch, when that commit is the tip of a branch, is a sort of "intentional coincidence"). So it is possible to make the rename as one commit, and the symlink as a second, but to get git merge to "see" the rename, you must also do two separate git merges. It's not particularly pleasing, but to fix this, you would have to make git's diff machinery smarter, so that it could at least figure out that a file-type-change makes for much greater chance of finding a rename if it "finds copies/renames a bit harder".

(Note that adding this to the diff machinery would fix both issues—git diff not seeing the rename, and git merge not seeing the rename—all at once.)


1By "trees" here I mean full file trees, rather than git's tree objects.

2More specifically, this is the case for a two-parent merge. Octopus merges are handled differently. I have not dug into the innards of octopus merges and can't really say anything more about those.

3The merge-base depends on the two (or more) commits to be merged, and to complicate things, with the default (recursive) strategy, if there are multiple merge-base candidates, git computes a "virtual merge base", which is not necessarily the same as any actual commit. The details are not something I can explain properly here: I know the general idea but not the specifics within git, and in any case it's rarely important and not directly relevant to your issue. There's a fairly nice example here, if you want to read more, although the example uses some rather Clearcase-like terminology.

like image 157
torek Avatar answered Feb 15 '23 09:02

torek