The original file that was split in 2 other files, is there a way in git to see what went where?



My problem:

I am a code reviewer, I have a situation in GIT:

  • before: a.txt

Then a developer decided to split the content of a.txt into 2 files and add a few changes all in one commit:

  • after: b.txt + few changes and c.txt + few changes

Is there an easy way to see:

  1. what came to b from a?
  2. what came to c from a?
  3. all extra changes apart from just moving stuff?

A specific command would help a lot.

A certain policy/workflow that prevents from problem like this (when there is no way to visually diff the changes) would also help.


Do not get confused by seeing a bounty awarded answer down below, I did not authorize it, because I don't think it's a good answer.

Is there an easy way to see:

  • what came to b from a?
  • what came to c from a?
  • all extra changes apart from just moving stuff?

I don't think there's really any way to extract this information other than visually inspecting the diff. However, it looks like we may be able to detect a split files using git diff along with the -C argument. For example, I start with a file that contains 38 lines, and move 24 into one file and 14 into another (and delete the original). git diff --name-status just tells me that I have renamed one file and added another:

R060    lorem.txt       fileA
A       fileB

But if we modify our command line to detect copies:

git diff --name-status -C30 HEAD^

We get:

C060    lorem.txt       fileA
R039    lorem.txt       fileB

The -C30 argument says "consider a file a copy if it is at least 30% similar to another file included in the commit". Note that there is a corresponding -M option that controls rename detection; it defaults to 50%.

A certain policy/workflow that prevents from problem like this would also help.

What exactly are you trying to prevent? There's not really anyway to distinguish "I split a file into two new files" from "I deleted a file and created two new files".

You could in theory prevent commits that both introduce new files and modify existing files. That would be relatively easy with a pre-receive hook, for example. But that's such a common situation, I'm not sure you'd want to do this in practice.

For the above, a pre-receive hook like the following might work:


while read old new ref; do
        while read type name; do
                if [ "$type" = "A" ]; then
        done < <(git show --name-status --format='' $new)

if [ "$has_new" = 1 -a "$has_mod" = 1 ]; then
        echo "ERROR: commits may not both create and modify files" >&2
        exit 1

exit 0

We could alternatively use our "split detection", discussed earlier, and implement something like:


while read old new ref; do
    git diff --name-status -C30 $old $new |
        awk '
            END {for (i in total) if (total[i] > 1) exit 1}

    if [ $? -ne 0 ]; then
        echo "ERROR: detected a split file"
        exit 1

exit 0

This will exit with an error if any file shows up as the "old name" for a file more than once. Trying to push to a repository using this pre-receive hook, using the example given in the first part of this answers, get me:

$ git push
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (5/5), 1.46 KiB | 1.46 MiB/s, done.
Total 5 (delta 0), reused 0 (delta 0)
remote: ERROR: detected a split file
To upstream
 ! [remote rejected] master -> master (pre-receive hook declined)

Maybe that helps? Without extensive testing I would worry about false positives with this solution.

