Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate Git statistics of changes per filetype?

Tags:

git

How many lines of code were added and deleted per filetype in a git repository? Preferably, being able to request these statistics for the past X commits. So say I would request this for the past 100 commits, my expected outcome would be something like this:

.css, 100 files changed,  2879 insertions(+), 1134 deletions(-)
.js,   17 files changed,   415 insertions(+),  502 deletions(-)
.php,   6 files changed,   478 insertions(+),  176 deletions(-)
.py,   10 files changed,   156 insertions(+),   56 deletions(-)

This must be possible somehow, but I cannot find a solution. I have read this question and also tried using Gitstats and RepoExplorer.

like image 548
Dirk J. Faber Avatar asked Mar 03 '23 11:03

Dirk J. Faber


2 Answers

There are two ways to compute the data you want. One of them is to compute the differences for the commit ranges as a whole and the other is to compute each commit individually and sum the values. The latter may be larger than the former if a file is touched multiple times. Since you didn't specify, I'll show you both.

To compute the former, which is simpler, you can use a command like the following:

git diff --numstat BASE_COMMIT.. | ruby -rset -e '
  x = {}
  while gets
    line = $_.chomp.split("\t")
    chunks = line[2].split(".")
    next if chunks.length == 1
    type = chunks[-1]
    x[type] ||= [0, 0, Set.new]
    2.times { |i| x[type][i] += line[i].to_i }
    x[type][2] << line[2]
  end
  x.sort_by { |(k, v)| k }.each do |(type, (add, del, set))|
    puts ".#{type} #{set.length} files changed, #{add} insertions(+), #{del} deletions(-)"
  end'

This uses git diff --numstat to process commits from BASE_COMMIT to the current branch and run the operation. If you want to process the last X commits, write HEAD~X. Note that this ignores files without a suffix and processes binary files as having no lines added or removed. It also doesn't produce aligned columns, but you can add that if you like. You can also use a different language, or a script instead of a one-liner.

If you want to do the sum of all individual commits, then replace the git diff invocation with this:

git rev-list BASE_COMMIT.. | xargs -I{} git diff --numstat {}^..{}
like image 94
bk2204 Avatar answered Mar 05 '23 14:03

bk2204


I would separate this problem into subproblems:

Raw log magic

I would start by specifying a starting point from where the problem is relatively easy

If you run

git log

then you see a list of commit summaries. Of course, this is not friendly - yet.

Now, let's limit our domain, let's say we are interested in a statistic for the last 100 commits:

git log -100

Now the number of commits in question is appropriate. But we still don't see adds and removals, let's remedy that:

git log -100 --stat

Starts to be better, right? Let's improve it further:

git log -5 --stat --format=""

Much, much better. Now, for each commit you have "useful lines", that is, lines containing the number of changes and a last line of the format of

9 files changed, 189 insertions(+), 1 deletion(-)

basically, if you have a line containing "files changed" or "file changed", you need to ignore it, unless you have a file with that name. All the other lines are useful raw inputs.

Algorithm for statistics

You need a data structure that will contain the file type as key and a pair of numbers as value. The first number is the number of minuses, the second number is the number of pluses. Pseudocode:

For Each ln In Lines Do
    If (Not ln.Replace("files", "file").Contains("file changed")) And ln.Contains(".") Then
        FileExtension = ln.Substring(ln.IndexOf(".") + 1, ln.IndexOf(" "))
        If (Not Extensions.Has(FileExtension)) Then
            Extensions(FileExtension) = [0, 0]
        End
        UsefulSubstring = ln.Substring(ln.LastIndexOf(" ") + 1)
        For Each char In UsefulSubstring Do
            If char = '+' Then
                Extensions(FileExtension)[1] = Extensions(FileExtension)[1] + 1
            Else
                Extensions(FileExtension)[0] = Extensions(FileExtension)[0] + 1
            End If
        End For
    End If
End For

This algorithm will construct your output, which you need to put into the console output in the format you prefer. So, you can call this program with the input you prefer. You can even embed the git log command into the project. It's not a very big task, so if you invest a few hours into this, maybe less, you will have the result you need.

like image 20
Lajos Arpad Avatar answered Mar 05 '23 16:03

Lajos Arpad