Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The original file that was split in 2 other files, is there a way in git to see what went where?

Tags:

git

My problem:

I am a code reviewer, I have a situation in GIT:

  • before: a.txt

Then a developer decided to split the content of a.txt into 2 files and add a few changes all in one commit:

  • after: b.txt + few changes and c.txt + few changes

Is there an easy way to see:

  1. what came to b from a?
  2. what came to c from a?
  3. all extra changes apart from just moving stuff?

A specific command would help a lot.

A certain policy/workflow that prevents from problem like this (when there is no way to visually diff the changes) would also help.

UPDATE

Do not get confused by seeing a bounty awarded answer down below, I did not authorize it, because I don't think it's a good answer.

like image 773
Trident D'Gao Avatar asked Jan 19 '18 22:01

Trident D'Gao


People also ask

How do I see differences between files in git?

The git diff command displays the differences between files in two commits or between a commit and your current repository. You can see what text has been added to, removed from, and changed in a file. By default, the git diff command displays any uncommitted changes to your repository.

Can you git diff two files?

The git diff command is used to perform the diff function on Git data sources. For example, commits, branches, files, and so on. It can also be used to compare two files of different branches.

How can I see old commits?

To pull up a list of your commits and their associated hashes, you can run the git log command. To checkout a previous commit, you will use the Git checkout command followed by the commit hash you retrieved from your Git log.


1 Answers

Is there an easy way to see:

  • what came to b from a?
  • what came to c from a?
  • all extra changes apart from just moving stuff?

I don't think there's really any way to extract this information other than visually inspecting the diff. However, it looks like we may be able to detect a split files using git diff along with the -C argument. For example, I start with a file that contains 38 lines, and move 24 into one file and 14 into another (and delete the original). git diff --name-status just tells me that I have renamed one file and added another:

R060    lorem.txt       fileA
A       fileB

But if we modify our command line to detect copies:

git diff --name-status -C30 HEAD^

We get:

C060    lorem.txt       fileA
R039    lorem.txt       fileB

The -C30 argument says "consider a file a copy if it is at least 30% similar to another file included in the commit". Note that there is a corresponding -M option that controls rename detection; it defaults to 50%.

A certain policy/workflow that prevents from problem like this would also help.

What exactly are you trying to prevent? There's not really anyway to distinguish "I split a file into two new files" from "I deleted a file and created two new files".

You could in theory prevent commits that both introduce new files and modify existing files. That would be relatively easy with a pre-receive hook, for example. But that's such a common situation, I'm not sure you'd want to do this in practice.

For the above, a pre-receive hook like the following might work:

#!/bin/bash                                                                      

while read old new ref; do
        while read type name; do
                if [ "$type" = "A" ]; then
                        has_new=1
                else
                        has_mod=1
                fi
        done < <(git show --name-status --format='' $new)
done

if [ "$has_new" = 1 -a "$has_mod" = 1 ]; then
        echo "ERROR: commits may not both create and modify files" >&2
        exit 1
fi

exit 0

We could alternatively use our "split detection", discussed earlier, and implement something like:

#!/bin/bash

while read old new ref; do
    git diff --name-status -C30 $old $new |
        awk '
            {total[$2]++}
            END {for (i in total) if (total[i] > 1) exit 1}
        '

    if [ $? -ne 0 ]; then
        echo "ERROR: detected a split file"
        exit 1
    fi
done

exit 0

This will exit with an error if any file shows up as the "old name" for a file more than once. Trying to push to a repository using this pre-receive hook, using the example given in the first part of this answers, get me:

$ git push
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (5/5), 1.46 KiB | 1.46 MiB/s, done.
Total 5 (delta 0), reused 0 (delta 0)
remote: ERROR: detected a split file
To upstream
 ! [remote rejected] master -> master (pre-receive hook declined)

Maybe that helps? Without extensive testing I would worry about false positives with this solution.

like image 190
larsks Avatar answered Oct 21 '22 01:10

larsks