Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Show similarity index of two files in a Git repository

Tags:

git

Is it possible to show the similarity index of two files in a Git repository using git diff? According to the man pages, git diff -p may produce patches with this information in certain cases, but the following command for example does not contain the similarity index information:

git diff -p --no-index a b

Where a and b are two files known to the repository. Is it possible to let Git calculate and report this similarity index between two existing files in a repository?

like image 428
Ton van den Heuvel Avatar asked Sep 17 '25 09:09

Ton van den Heuvel


1 Answers

Unfortunately, no—or more precisely, not with any existing front-end command. The only way to get Git to compute a similarity index for two files is to create two tree objects in which it seems possible, to Git, that the file was renamed.

We can, however, do just that. Here's the method:

  1. Create a temporary index file name.
  2. Add the first file to the temporary index and write out a tree, saving its hash ID.
  3. Remove the first file and add the second file; write out a new tree as before.
  4. Diff the two trees with --find-renames=01.

(Using a rename threshold of 00 does not work: this just disables rename-detection.)

I wrapped this up into a script that is here and also appears below. Place the script somewhere in your $PATH (I use $HOME/scripts/ as a directory containing executable scripts that run on any architecture) and you can run git similarity a b.

(This is lightly tested.)


#! /bin/sh
#
# git-similarity: script to compute similarity of two files

. git-sh-setup # for die() etc

TAB=$'\t'

# should probably use OPTIONS_SPEC, but not yet
usage()
{
    echo "usage: git similarity file1 file2"
}

case $# in
2) ;;
*) usage 1>&2; exit 1;;
esac

test -f "$1" || die "cannot find file $1, or not a regular file"
test -f "$2" || die "cannot find file $2, or not a regular file"
test "x$1" != "x$2" || die "file names $1 and $2 are identical"

TF=$(mktemp) || exit 1

trap "rm -f $TF" 0 1 2 3 15
export GIT_INDEX_FILE=$TF

# create a tree holding (just) the argument file
maketree() {
    rm -f $TF
    git add "$1" || exit 1
    git write-tree || exit 1
}

# Use git diff-tree here for repeatibility.  We expect output of
# the form Rnnn$TAB$file1$TAB$file2, but if we get two lines,
# with D and A, we'll just print 000 here.
print_similarity() {
    set $(git diff-tree --name-status --find-renames=01 $1 $2)
    case "$1" in
    R*) echo "${1#R}";;
    *) echo "000";;
    esac
}

h1=$(maketree "$1")
h2=$(maketree "$2")
print_similarity $h1 $h2
like image 155
torek Avatar answered Sep 19 '25 22:09

torek