Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quantifying the amount of change in a git diff?

Tags:

git

word-count

I use git for a slightly unusual purpose--it stores my text as I write fiction. (I know, I know...geeky.)

I am trying to keep track of productivity, and want to measure the degree of difference between subsequent commits. The writer's proxy for "work" is "words written", at least during the creation stage. I can't use straight word count as it ignores editing and compression, both vital parts of writing. I think I want to track:

 (words added)+(words removed) 

which will double-count (words changed), but I'm okay with that.

It'd be great to type some magic incantation and have git report this distance metric for any two revisions. However, git diffs are patches, which show entire lines even if you've only twiddled one character on the line; I don't want that, especially since my 'lines' are paragraphs. Ideally I'd even be able to specify what I mean by "word" (though \W+ would probably be acceptable).

Is there a flag to git-diff to give diffs on a word-by-word basis? Alternately, is there a solution using standard command-line tools to compute the metric above?

like image 331
Alex Feinman Avatar asked May 20 '10 13:05

Alex Feinman


People also ask

How do I see changes in git diff?

You can run the git diff HEAD command to compare the both staged and unstaged changes with your last commit. You can also run the git diff <branch_name1> <branch_name2> command to compare the changes from the first branch with changes from the second branch. Order does matter when you're comparing branches.

How do you find the difference between two commits?

To see the changes between two commits, you can use git diff ID1.. ID2 , where ID1 and ID2 identify the two commits you're interested in, and the connector .. is a pair of dots. For example, git diff abc123.. def456 shows the differences between the commits abc123 and def456 , while git diff HEAD~1..

What does git diff show you?

The git diff command displays the differences between files in two commits or between a commit and your current repository. You can see what text has been added to, removed from, and changed in a file. By default, the git diff command displays any uncommitted changes to your repository.

How does git diff implemented?

In Git, there are four diff algorithms, namely Myers, Minimal, Patience, and Histogram, which are utilized to obtain the differences of the two same files located in two different commits. The Minimal and the Histogram algorithms are the improved versions of the Myers and the Patience respectively.


2 Answers

Building on James' and cornmacrelf's input, I've added arithmetic expansion, and came up with a few reusable alias commands for counting words added, deleted, and duplicated in a git diff:

alias gitwa='git diff --word-diff=porcelain origin/master | grep -e "^+[^+]" | wc -w | xargs' alias gitwd='git diff --word-diff=porcelain origin/master | grep -e "^-[^-]" | wc -w | xargs' alias gitwdd='git diff --word-diff=porcelain origin/master |grep -e"^+[^+]" -e"^-[^-]"|sed -e's/.//'|sort|uniq -d|wc -w|xargs'  alias gitw='echo $(($(gitwa) - $(gitwd)))' 

Output from gitwa and gitwd is trimmed using xargs trick.

Words duplicated added from Miles' answer.

like image 151
Stoutie Avatar answered Sep 24 '22 21:09

Stoutie


wdiff does word-by-word comparison. Git can be configured to use an external program to do the diffing. Based on those two facts and this blog post, the following should do roughly what you want.

Create a script to ignore most of the unnecessary arguments that git-diff provides and pass them to wdiff. Save the following as ~/wdiff.py or something similar and make it executable.

#!/usr/bin/python  import sys import os  os.system('wdiff -s3 "%s" "%s"' % (sys.argv[2], sys.argv[5])) 

Tell git to use it.

git config --global diff.external ~/wdiff.py git diff filename 
like image 39
Edward Dale Avatar answered Sep 22 '22 21:09

Edward Dale