Imagine that you have two files in a git repository, say A.txt
and B.txt
.
Is it possible to concat the two files into a third one A+B.txt
, removing the original A.txt
and B.txt
and committing it all, so the history is still preserved?
That is, if I asked git log --follow A+B.txt
I would know that the content originated from the A.txt
and B.txt
files?
I've tried to separate the files into two different branches and then merging them into a new file (while removing the old ones), but to no avail.
To combine two separate Git repositories into one, add the repository to merge in as a remote to the repository to merge into. Then, combine their histories by merging while using the --allow-unrelated-histories command line option.
In the Conceptual Overview section, we saw how a feature branch can incorporate upstream changes from main using either git merge or git rebase . Merging is a safe option that preserves the entire history of your repository, while rebasing creates a linear history by moving your feature branch onto the tip of main .
Git can handle most merges on its own with automatic merging features. A conflict arises when two separate branches have made edits to the same line in a file, or when a file has been deleted in one branch but edited in the other.
The long answer is 'yes'!
Full credit to Raymond Chen's article Combining two files into one while preserving line history:
Imagine you had two files: fruits
& veggies
The naïve way of combining the files would be to do it in a single commit, but you'll lose line history on one of the files (or both)
You could tweak the
git blame
algorithms with options like-M
and-C
to get it to try harder, but in practice, you don’t often have control over those options (eg. the git blame may be performed on a server)The trick is to use a
merge
with two forked branches
- In one branch, we rename
veggies
toproduce
.- In the other branch, we rename
fruits
toproduce
.git checkout -b rename-veggies git mv veggies produce git commit -m "rename veggies to produce"
git checkout - git mv fruits produce git commit -m "rename fruits to produce"
Then merge the first into the second
git merge -m "combine fruits and veggies" rename-veggies
This will generate a merge conflict - that's okay - now take the changes from each branch's Produce file and combine into one - here's a simple concatenation (but resolve the merge conflict however you please):
cat "produce~HEAD" "produce~rename-veggies" >produce git add produce git merge --continue
The resulting
produce
file was created by a merge, so git knows to look in both parents of the merge to learn what happened.And that’s where it sees that each parent contributed half of the file, and it also sees that the files in each branch were themselves created via renames of other files, so it can chase the history back into both of the original files.
Each line should be correctly attributed to the person who introduced it in the original file, whether it’s fruits or veggies. People investigating the produce file get a more accurate history of who last touched each line of the file.
For best results, your rename commit should be a pure rename. Resist the temptation to edit the file’s contents at the same time you rename it. A pure rename ensure that git’s rename detection will find the match. If you edit the file in the same commit as the rename, then whether the rename is detected as such will depend on git’s “similar files” heuristic.
Checkout the full article for a full step by step breakdown and more explanations
Originally, I had thought this might be a use case for git merge-file
doing something like this:
>produce echo #empty
git merge-file fruits produce veggies --union -p > produce
git rm fruits veggies
git add produce
git commit -m "combine fruits and veggies"
However, all this does is help simulate the merge diffing algorithm against two different files - the end output when committed is identical to if the file had been updated manually and the resulting changes manually committed
The short answer is "no" (or perhaps even Mu). (But for a way to get useful synthesized line history for a combined file via git blame
, see KyleMit's answer.)
History, in Git, is the set of commits. There is no such thing as "file history": you either have a commit, or you don't, and that commit has one or more parents, or it doesn't. This means that "file history" as a thing doesn't exist—and yet, git log --follow
exists. This is self-contradictory: How can git log --follow
produce a file history, if file history doesn't exist?
The answer is that git log --follow
cheats. It doesn't really find file history. It looks through history and constructs a sub-history by changing the (single) name of the file it is looking for. It looks at each commit, one at a time, and runs a (sped-up, limited) git diff --find-renames
of that commit against its parent.1 If the diff says that file X.txt
in the parent was renamed to A.txt
in the child, and you're running git log --follow A.txt
, the code in git log
now starts looking for X.txt
.
Since there's no code to start looking for more than one file at a time, you can't get this particular cheat to accommodate your desired situation, which is to go from looking for one particular file to more-than-one file. (There are actually two problems here. One is that, due to the rather limited internal implementation,2git log --follow
can only look at one file at a time. The other is that rename detection does not include "combine detection": there is a form of "split detection", in which Git will do copy-finding, enabled with --find-copies
and --find-copies-harder
. The latter is very compute-intensive, and both are working in the wrong direction here, although it could be made to do the right thing simply by reversing the order of the diff.)
1As this implies, --follow
doesn't look at merge diffs at all, at least by default. See also `git log --follow --graph` skips commits.
2aka "cheesy hack"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With