cloc
enables one to count the number of lines of code stored in a directory per language per type (blank, comment, or code).
git blame
enables one to see which part of a file belong to whom.
I'm looking for a way to combine both so that one gets a (three dimensional) matrix that lists the lines of code per type per language per user.
Are there elegant builtin ways to do this or should one "scrap" the "blame" parts (by running grep
after git blame
) of each user and run cloc
on them to calculate the table for each user?
EDIT:
Naive approach (based on the comment of @Jubobs):
grep "^[^(]*([^)]*)"
to capture the list of all users and retrieve the uniques with sort
and uniq
.grep "^[^(]*($user)"
such that only the lines of that user remain.This is more or less how to generate the desired output. But as one can see, this approach does a lot of copying (or at least storing in memory) and one can actually compute the lines per user by running over the file once instead of multiple times.
Desired output:
something like:
+--------+--------------------------------+--------------------------------+
|User | C# | XML |
+--------+-------+-------+---------+------+-------+-------+---------+------+
| | files | blank | comment | code | files | blank | comment | code |
+--------+-------+-------+---------+------+-------+-------+---------+------+
| Foo | 12 | 75 | 148 | 2711 | 2 | 42 | 0 | 0 |
| Bar | 167 | 1795 | 1425 | 2 | 16 | 0 | 512 | 1678 |
+--------+-------+-------+---------+------+-------+-------+---------+------+
| Total | 179 | 1870 | 1573 | 2713 | 18 | 42 | 512 | 1678 |
+--------+-------+-------+---------+------+-------+-------+---------+------+
The git blame command is used to examine the contents of a file line by line and see when each line was last modified and who the author of the modifications was. The output format of git blame can be altered with various command line options.
Blaming only a limited line range Sometimes, You don't need to blame the entire file, you just need to blame in a limited line range. Git blame provides the option for that. -L will take the option for the start line and for the end line.
The git blame command is used to know who/which commit is responsible for the latest changes made to a file. The author/commit of each line can also been seen.
This is an older question but it peaked my interest so I started playing around with trying to solve it. This doesn't spit out a nice report but it does put data in a csv with the 3 columns being: file extension
, email of committer
, # lines this user has committed for this file type
. This also doesn't give the blank, comment, code lines like cloc does either. If I have time I'll try getting all of that to work nicely, but thought this might be a 'good enough' solution or at least get you started in the right direction.
#!/bin/bash
LIST_OF_GIT_FILES=/tmp/gitfiles.txt
GIT_BLAME_COMBINED_RESULTS=/tmp/git-blame.txt
OUTPUT=/tmp/git-blame-output.txt
SUMMARY=code-summary.csv
rm $GIT_BLAME_COMBINED_RESULTS
git ls-files > $LIST_OF_GIT_FILES
while read p; do
git blame -e -f $p >> $GIT_BLAME_COMBINED_RESULTS
done < $LIST_OF_GIT_FILES
awk -F ' ' '{print $2 "," $3}' $GIT_BLAME_COMBINED_RESULTS | tr -d '(<>' | awk -F ',' '{n = split($1, a, "."); print a[n] "," $2}' > $OUTPUT
sort $OUTPUT | uniq -c | sort -n | awk -F ' ' '{print $2 "," $1}' | sort > $SUMMARY
rm $GIT_BLAME_COMBINED_RESULTS
rm $LIST_OF_GIT_FILES
rm $OUTPUT
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With