Imagine that you have two files in a git repository, say <code>A.txt</code> and <code>B.txt</code>. Is it possible to concat the two files into a third one <code>A+B.txt</code>, removing the original <code>A.txt</code> and <code>B.txt</code> and committing it all, so the history is still preserved? That is, if I asked <code>git log --follow A+B.txt</code> I would know that the content originated from the <code>A.txt</code> and <code>B.txt</code> files? I've tried to separate the files into two different branches and then merging them into a new file (while removing the old ones), but to no avail.

The long answer is 'yes'! Full credit to Raymond Chen's article Combining two files into one while preserving line history: Imagine you had two files: <code>fruits</code> & <code>veggies</code> <img src="https://i.stack.imgur.com/sIiBQ.png" alt="git blame for both fruits and veggies"> <blockquote> The naïve way of combining the files would be to do it in a single commit, but you'll lose line history on one of the files (or both) You could tweak the <code>git blame</code> algorithms with options like <code>-M</code> and <code>-C</code> to get it to try harder, but in practice, you don’t often have control over those options (eg. the git blame may be performed on a server) <h3>The trick is to use a <code>merge</code> with two forked branches</h3> <ul> <li>In one branch, we rename <code>veggies</code> to <code>produce</code>.</li> <li>In the other branch, we rename <code>fruits</code> to <code>produce</code>.</li> </ul> <pre class="prettyprint lang-sh prettyprint-override"><code>git checkout -b rename-veggies git mv veggies produce git commit -m "rename veggies to produce" </code></pre> <pre class="prettyprint lang-sh prettyprint-override"><code>git checkout - git mv fruits produce git commit -m "rename fruits to produce" </code></pre> Then merge the first into the second <pre class="prettyprint lang-sh prettyprint-override"><code>git merge -m "combine fruits and veggies" rename-veggies </code></pre> This will generate a merge conflict - that's okay - now take the changes from each branch's Produce file and combine into one - here's a simple concatenation (but resolve the merge conflict however you please): <pre class="prettyprint lang-sh prettyprint-override"><code>cat "produce~HEAD" "produce~rename-veggies" >produce git add produce git merge --continue </code></pre> The resulting <code>produce</code> file was created by a merge, so git knows to look in both parents of the merge to learn what happened. <img src="https://i.stack.imgur.com/idgVB.png" alt="git blame for produce"> And that’s where it sees that each parent contributed half of the file, and it also sees that the files in each branch were themselves created via renames of other files, so it can chase the history back into both of the original files. Each line should be correctly attributed to the person who introduced it in the original file, whether it’s fruits or veggies. People investigating the produce file get a more accurate history of who last touched each line of the file. For best results, your rename commit should be a pure rename. Resist the temptation to edit the file’s contents at the same time you rename it. A pure rename ensure that git’s rename detection will find the match. If you edit the file in the same commit as the rename, then whether the rename is detected as such will depend on git’s “similar files” heuristic. </blockquote> Checkout the full article for a full step by step breakdown and more explanations <hr> Originally, I had thought this might be a use case for <code>git merge-file</code> doing something like this: <pre class="prettyprint lang-sh prettyprint-override"><code>>produce echo #empty git merge-file fruits produce veggies --union -p > produce git rm fruits veggies git add produce git commit -m "combine fruits and veggies" </code></pre> However, all this does is help simulate the merge diffing algorithm against two different files - the end output when committed is identical to if the file had been updated manually and the resulting changes manually committed

The short answer is "no" (or perhaps even Mu). (But for a way to get useful synthesized line history for a combined file via <code>git blame</code>, see KyleMit's answer.) History, in Git, is the set of commits. There is no such thing as "file history": you either have a commit, or you don't, and that commit has one or more parents, or it doesn't. This means that "file history" as a thing doesn't exist—and yet, <code>git log --follow</code> exists. This is self-contradictory: How can <code>git log --follow</code> produce a file history, if file history doesn't exist? The answer is that <code>git log --follow</code> cheats. It doesn't really find file history. It looks through history and constructs a sub-history by changing the (single) name of the file it is looking for. It looks at each commit, one at a time, and runs a (sped-up, limited) <code>git diff --find-renames</code> of that commit against its parent.1 If the diff says that file <code>X.txt</code> in the parent was renamed to <code>A.txt</code> in the child, and you're running <code>git log --follow A.txt</code>, the code in <code>git log</code> now starts looking for <code>X.txt</code>. Since there's no code to start looking for more than one file at a time, you can't get this particular cheat to accommodate your desired situation, which is to go from looking for one particular file to more-than-one file. (There are actually two problems here. One is that, due to the rather limited internal implementation,2<code>git log --follow</code> can only look at one file at a time. The other is that rename detection does not include "combine detection": there is a form of "split detection", in which Git will do copy-finding, enabled with <code>--find-copies</code> and <code>--find-copies-harder</code>. The latter is very compute-intensive, and both are working in the wrong direction here, although it could be made to do the right thing simply by reversing the order of the diff.) <hr> 1As this implies, <code>--follow</code> doesn't look at merge diffs at all, at least by default. See also `git log --follow --graph` skips commits. 2aka "cheesy hack"

git combining two files into one with history preserved

2 Answers

The long answer is 'yes'!

Full credit to Raymond Chen's article Combining two files into one while preserving line history:

Imagine you had two files: fruits & veggies

git blame for both fruits and veggies

The naïve way of combining the files would be to do it in a single commit, but you'll lose line history on one of the files (or both)

You could tweak the git blame algorithms with options like -M and -C to get it to try harder, but in practice, you don’t often have control over those options (eg. the git blame may be performed on a server)

The trick is to use a merge with two forked branches

In one branch, we rename veggies to produce.

In the other branch, we rename fruits to produce.
git checkout -b rename-veggies
git mv veggies produce
git commit -m "rename veggies to produce"
git checkout -
git mv fruits produce
git commit -m "rename fruits to produce"
Then merge the first into the second
git merge -m "combine fruits and veggies" rename-veggies
This will generate a merge conflict - that's okay - now take the changes from each branch's Produce file and combine into one - here's a simple concatenation (but resolve the merge conflict however you please):
cat "produce~HEAD" "produce~rename-veggies" >produce
git add produce
git merge --continue
The resulting produce file was created by a merge, so git knows to look in both parents of the merge to learn what happened.

And that’s where it sees that each parent contributed half of the file, and it also sees that the files in each branch were themselves created via renames of other files, so it can chase the history back into both of the original files.

Each line should be correctly attributed to the person who introduced it in the original file, whether it’s fruits or veggies. People investigating the produce file get a more accurate history of who last touched each line of the file.

For best results, your rename commit should be a pure rename. Resist the temptation to edit the file’s contents at the same time you rename it. A pure rename ensure that git’s rename detection will find the match. If you edit the file in the same commit as the rename, then whether the rename is detected as such will depend on git’s “similar files” heuristic.

Checkout the full article for a full step by step breakdown and more explanations

Originally, I had thought this might be a use case for git merge-file doing something like this:

>produce echo #empty
git merge-file fruits produce veggies --union -p > produce
git rm fruits veggies
git add produce
git commit -m "combine fruits and veggies"

However, all this does is help simulate the merge diffing algorithm against two different files - the end output when committed is identical to if the file had been updated manually and the resulting changes manually committed

169

answered Oct 21 '22 01:10

KyleMit

The short answer is "no" (or perhaps even Mu). (But for a way to get useful synthesized line history for a combined file via git blame, see KyleMit's answer.)

History, in Git, is the set of commits. There is no such thing as "file history": you either have a commit, or you don't, and that commit has one or more parents, or it doesn't. This means that "file history" as a thing doesn't exist—and yet, git log --follow exists. This is self-contradictory: How can git log --follow produce a file history, if file history doesn't exist?

The answer is that git log --follow cheats. It doesn't really find file history. It looks through history and constructs a sub-history by changing the (single) name of the file it is looking for. It looks at each commit, one at a time, and runs a (sped-up, limited) git diff --find-renames of that commit against its parent.¹ If the diff says that file X.txt in the parent was renamed to A.txt in the child, and you're running git log --follow A.txt, the code in git log now starts looking for X.txt.

Since there's no code to start looking for more than one file at a time, you can't get this particular cheat to accommodate your desired situation, which is to go from looking for one particular file to more-than-one file. (There are actually two problems here. One is that, due to the rather limited internal implementation,²git log --follow can only look at one file at a time. The other is that rename detection does not include "combine detection": there is a form of "split detection", in which Git will do copy-finding, enabled with --find-copies and --find-copies-harder. The latter is very compute-intensive, and both are working in the wrong direction here, although it could be made to do the right thing simply by reversing the order of the diff.)

¹As this implies, --follow doesn't look at merge diffs at all, at least by default. See also `git log --follow --graph` skips commits.

²aka "cheesy hack"

answered Oct 21 '22 02:10

torek

Related questions
                            
                                Git rebase arguments context explanation needed
                            
                                In Git, how can I reorder (changes from) pushed commits?
                            
                                Git: How to unfetch remote branches (Github pull requests)
                            
                                What does dirty/clean working directory mean?
                            
                                Merging two completely different repositories
                            
                                setting git insteadof in command line
                            
                                How do I find removed lines with git?
                            
                                503 error pushing to remote
                            
                                Git branching model strategy
                            
                                Why does Git report a random, non-existent directory or file as an untracked file?
                            
                                Git pull not possible because of unmerged files
                            
                                git push using python
                            
                                Squashing commits after they are pushed
                            
                                How to set nano up for git commit messages with line length limits
                            
                                Excluding remote branches from "git log -all"
                            
                                Change GIT login/username in IntelliJ IDEA, WebStorm, RubyMine etc
                            
                                Rebase a branch that has child branches
                            
                                I wish I'd branched in Git - can I turn back time?
                            
                                Why is Git saying file is 'deleted by us' when I haven't touched it?
                            
                                Reverse last push on Github.

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

git combining two files into one with history preserved

Tags:

git

file

merge

Peter Uhnak