Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git difftool ridiculously slow in Cygwin/MinGW

I noticed that git difftool is very slow. An delay of about 1..2 seconds appears between each diff invocation.

To benchmark it I have written a custom difftool command:

#!/bin/sh
echo $0 $1 $2

And configured Git to use this tool in my ~/.gitconfig

[diff]
    tool = mydiff
[difftool "mydiff"]
    prompt = false
    cmd = "~/mydiff \"$LOCAL\" \"$REMOTE\""

I tested it on the Git sources:

$ git clone https://github.com/git/git.git
$ cd git
$ git rev-parse HEAD
1bc8feaa7cc752fe3b902ccf83ae9332e40921db
$ git diff head~10 --stat --name-only | wc -l
23

When I time a git difftool with 259b5e6d33, the result is ridiculously slow:

$ time git difftool 259b5
mydiff /dev/null Documentation/RelNotes/2.6.3.txt
...
mydiff /tmp/mY2T6l_upload-pack.c upload-pack.c

real    0m10.381s
user    0m1.997s
sys     0m6.667s

By trying a simpler script it goes much faster:

$ time git diff --name-only --stat 259b5 | xargs -n1 -I{} sh -c 'git show 259b5:{} > {}.tmp && ~/mydiff {} {}.tmp'
mydiff Documentation/RelNotes/2.6.3.txt Documentation/RelNotes/2.6.3.txt.tmp
mydiff upload-pack.c upload-pack.c.tmp

real    0m1.149s
user    0m0.472s
sys     0m0.821s

What did I miss?

Here the results I got

| Cygwin | Debian | Ubuntu | Method   |
| ------ | ------ | ------ | -------- |
| 10.381 |  2.620 | 0.580  | difftool |
|  1.149 |  0.567 | 0.210  | custom   |

For the Cygwin results, I measured 2.8s spent in git-difftool and 7.5s spent in git-difftool--helper. The latter is 98 lines long. I don't understand why it is that slow.

like image 819
nowox Avatar asked Dec 01 '15 19:12

nowox


2 Answers

Using some of the techniques found on the msysgit GitHub, I have narrowed this down a bit.

For each file in the diff, git-difftool--helper re-runs the following internal commands:

12:44:46.941239 git.c:351               trace: built-in: git 'config' 'diff.tool'
12:44:47.359239 git.c:351               trace: built-in: git 'config' 'difftool.bc.cmd'
12:44:47.933239 git.c:351               trace: built-in: git 'config' '--bool' 'mergetool.prompt'
12:44:48.797239 git.c:351               trace: built-in: git 'config' '--bool' 'difftool.prompt'
12:44:49.696239 git.c:351               trace: built-in: git 'config' 'difftool.bc.cmd'
12:44:50.135239 git.c:351               trace: built-in: git 'config' 'difftool.bc.path'
12:44:50.422239 git.c:351               trace: built-in: git 'config' 'mergetool.bc.path'
12:44:51.060239 git.c:351               trace: built-in: git 'config' 'difftool.bc.cmd'
12:44:51.452239 git.c:351               trace: built-in: git 'config' 'difftool.bc.cmd'

Notice that, in this particular case, it took roughly 4.5 seconds to execute these. This is a pretty consistent pattern throughout my log.

Note too that some of these are duplicate - git config difftool.bc.cmd is called 4 times!

Now, possible remedies:

  • I cut the execution time for these commands in half by moving all of the diff-related sections to the top of my .gitconfig file. Seriously. It's still noticeable, but now on the order of 2 seconds instead of 4.5.
  • Make sure that your Git folder under Program Files and your user profile (where .gitconfig lives) are both excluded from realtime virus scanning.
  • Fundamentally, Git needs to be more efficient with parsing and getting configuration values. Ideally, it would cache these instead of re-requesting (and reparsing...) from config every time in a loop. Perhaps even cached for the entire command execution.
like image 128
GalacticCowboy Avatar answered Sep 19 '22 20:09

GalacticCowboy


git difftool should be slightly faster with Git 2.13 (Q2 2017)
See commit d12a8cf (14 Apr 2017) by Jeff Hostetler (jeffhostetler).
(Merged by Junio C Hamano -- gitster -- in commit 8868ba1, 24 Apr 2017)

unpack-trees: avoid duplicate ODB lookups during checkout

(ODB: Object DataBase)

Teach traverse_trees_recursive() to not do redundant ODB lookups when both directories refer to the same OID.

In operations such as read-tree and checkout, there will likely be many peer directories that have the same OID when the differences between the commits are relatively small.
In these cases we can avoid hitting the ODB multiple times for the same OID.

This patch handles n=2 and n=3 cases and simply copies the data rather than repeating the fill_tree_descriptor().

================

On the Windows repo (500K trees, 3.1M files, 450MB index), this reduced the overall time by 0.75 seconds when cycling between 2 commits with a single file difference.

(avg) before: 22.699
(avg) after:  21.955
===============
like image 43
VonC Avatar answered Sep 20 '22 20:09

VonC