Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get an edit-distance between two commits?

I'm looking for a way to compute a good edit distance between the contents of any two commits.

The best I've found is to derive something from the output of

git diff <commit-ish> <commit-ish> --numstat

...but anything I can come up using this method would be a very crude proxy for edit distance.

Is there anything better?

like image 669
kjo Avatar asked Oct 31 '22 22:10

kjo


1 Answers

I think your best bet here is to use an outside tool for calculating Levenshtein distance. For example Perl's Text::Levenshtein module.

For example, somewhat hackily:

#!/bin/sh

COMMIT_ONE=$1
COMMIT_TWO=$2

FILES_AFFECTED=$(git diff $COMMIT_ONE $COMMIT_TWO --numstat | awk '{ print $3 }')

TOTAL_LEV_DIST=0
for FILE in $FILES_AFFECTED; do

    CONTENTS_ONE=$(git show $COMMIT_ONE:$FILE)
    CONTENTS_TWO=$(git show $COMMIT_TWO:$FILE)

    LEV_DIST=$(perl -MText::Levenshtein -e 'my ($str1, $str2) = @ARGV; print Text::Levenshtein::distance($str1, $str2);' "$CONTENTS_ONE" "$CONTENTS_TWO")

    TOTAL_LEV_DIST=$(($TOTAL_LEV_DIST + $LEV_DIST))

done

echo $TOTAL_LEV_DIST

Which seems to do the trick:

$ git diff HEAD HEAD~3 --numstat
0       5       Changes
1       3       dist.ini
$ ./lev_dist_git_commits.sh HEAD HEAD~3
230
$ ./lev_dist_git_commits.sh HEAD HEAD
0

Note: You can install Text::Levenshtein::XS for a speed boost if you have a C compiler and if speed is important. On my computer that reduced the time from 1.5s to 0.05s.

like image 185
Kaoru Avatar answered Nov 10 '22 20:11

Kaoru