Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What tools or techniques are available to "datamine" my mercurial repository?

We have a 2,000,000 lines of code application in Mercurial. Obviously there is a lot of valuable information inside this repository.

Are there any tools or techniques to dig out some of that information?

For instance, over the history of the project, what five files have seen the most changes? What five files are the most different from what they were one year ago? Any particular lines of code seen a lot of churn?

I'm interested in that sort of thing and more.

Is there a way to extract this kind of information from our repository?

like image 686
Nick Hodges Avatar asked Jun 18 '12 15:06

Nick Hodges


2 Answers

I don't know of any tools specifically made for doing this, but Mercurial's log templates are very powerful for getting data out of the system. I've done a bit of this sort of analysis in the past, and my approach was:

  1. Use hg log to dump commits to some convenient format (xml in my case)
  2. Write a script to import the xml into something queryable (database, or just work from the XML directly if it's not too big)

Here's an example hg log command to get you going:

mystyle.txt: (template)

changeset = '<changeset>\n<user>{author|user}</user>\n<date>{date|rfc3339date|escape}</date>\n<files>\n{file_mods}{file_adds}{file_dels}</files>\n<rev>{node}</rev>\n<desc>{desc|strip|escape}</desc>\n<branch>{branches}</branch><diffstat>{diffstat}</diffstat></changeset>\n\n'
file_mod = '<file action="modified">{file_mod|escape}</file>\n'
file_add = '<file action="added">{file_add|escape}</file>\n'
file_del = '<file action="deleted">{file_del|escape}</file>\n'

Example invocation using template and date range:

hg --repository /path/to/repo log -d "2012-01-01 to 2012-06-01" --no-merges --style mystyle.txt
like image 198
overthink Avatar answered Sep 28 '22 21:09

overthink


Try the built-in hg churn extension. One thing I like to use it for, for example, is to see a monthly bar graph of commits like this:

> hg churn -csf '%Y-%m'

2014-02     65 *************************************
2014-03     22 *************
2014-04     52 ******************************
2014-05     67 ***************************************
2014-06     31 ******************
2014-07     29 *****************
2014-08     29 *****************
2014-09     61 ***********************************
2014-10     36 *********************
2014-11     23 *************
2014-12     32 ******************
2015-01     60 ***********************************
2015-02     20 ************

(might want to set up aliases if you find you're using the command often enough)

like image 45
fakeleft Avatar answered Sep 28 '22 19:09

fakeleft