Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does git stash -p take long to start?

Tags:

git

git-stash

In my repo, git diff and git stash both run quickly, in less than a second. However git stash -p takes a good 20 seconds before showing the first hunk. Why could this be?

like image 930
Tor Klingberg Avatar asked Mar 07 '18 10:03

Tor Klingberg


People also ask

How many number of files we can stash at a time?

You can have as many stashes as you want. Get rid of old ones when you feel like it by running git stash drop or git stash clear (read the docs for those).

What happens if you git stash multiple times?

If you want to git stash pop twice because you want both stashes in the same commit but you encounter "error: Your local changes to the following files would be overwritten by merge:" on your 2nd git stash pop , then you can: 1) git stash pop , 2) git add . , and 3) git stash pop .

What happens when I git stash?

git stash temporarily shelves (or stashes) changes you've made to your working copy so you can work on something else, and then come back and re-apply them later on.

Does git Clean affect stash?

A safer option is to run git stash --all to remove everything but save it in a stash. Assuming you do want to remove cruft files or clean your working directory, you can do so with git clean .


Video Answer


2 Answers

This should improve with Git 2.25.2 (March 2020), which adds code simplification.
See discussion.

See commit 26f924d (07 Jan 2020) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit a3648c0, 22 Jan 2020)

unpack-trees: exit check_updates() early if updates are not wanted

Signed-off-by: Elijah Newren

check_updates() has a lot of code that repeatedly checks whether o->update or o->dry_run are set.

(Note that o->dry_run is a near-synonym for !o->update, but not quite as per commit 2c9078d05bf2 ("unpack-trees: add the dry_run flag to unpack_trees_options", 2011-05-25, Git v1.7.6-rc0).)
In fact, this function almost turns into a no-op whenever the condition

!o->update || o->dry_run

is met.

Simplify the code by checking this condition at the beginning of the function, and when it is true, do the few things that are relevant and return early.

There are a few things that make the conversion not quite obvious:

  • The fact that check_updates() does not actually turn into a no-op when updates are not wanted may be slightly surprising.
    However, commit 33ecf7eb61 (Discard "deleted" cache entries after using them to update the working tree, 2008-02-07, Git v1.5.5-rc0) put the discarding of unused cache entries in check_updates() so we still need to keep the call to remove_marked_cache_entries().
    It's possible this call belongs in another function, but it is certainly needed as tests will fail if it is removed.
  • The original called remove_scheduled_dirs() unconditionally.
    Technically, commit 7847892716 (unlink_entry(): introduce schedule_dir_for_removal(), 2009-02-09, Git v1.6.3-rc0) should have made that call conditional, but it didn't matter in practice because remove_scheduled_dirs() becomes a no-op when all the calls to unlink_entry() are skipped.
    As such, we do not need to call it.
  • When (o->dry_run && o->update), the original would have two calls to git_attr_set_direction() surrounding a bunch of skipped updates.
    These two calls to git_attr_set_direction() cancel each other out and thus can be omitted when o->dry_run is true just as they already are when !o->update.
  • The code would previously call setup_collided_checkout_detection() and report_collided_checkout() even when o->dry_run.
    However, this was just an expensive no-op because setup_collided_checkout_detection() merely cleared the CE_MATCHED flag for each cache entry, and report_collided_checkout() reported which ones had it set.
    Since a dry-run would skip all the checkout_entry() calls, CE_MATCHED would never get set and thus no collisions would be reported.
    Since we can't detect the collisions anyway without doing updates, skipping the collisions detection setup and reporting is an optimization.
  • The code previously would call get_progress() and display_progress() even when (!o->update || o->dry_run).
    This served to show how long it took to skip all the updates, which is somewhat useless.
    Since we are skipping the updates, we can skip showing how long it takes to skip them.
like image 95
VonC Avatar answered Oct 23 '22 16:10

VonC


I notice the same problem. This started at least over a year ago and has not improved since than. I also use git on a very big repo. Unfortunately in my case there is also a lot of binary data in it since it’s just a mirror of a SVN repo using git_svn and my colleagues think it’s a good idea to place binary test data into the repo.

No answer, just hints and guesses where to search:

  • It seams the big difference is, that in case of stash -p the function stash_patch is called. Otherwise stash_working_tree.

  • In stash_patch there are child processes called executing other git commands. One of these is read-tree (see: man git-read-tree). The final command looks like this: GIT_INDEX_FILE=index.stash.<PID> git read-tree HEAD. This actually takes no time.

  • The next step is another child process calling GIT_INDEX_FILE=index.stash.<PID> git add--interactive --patch=stash -- <PATH>This is where all the reads come from and what takes up all the time. Interesting thing is: Calling just GIT_INDEX_FILE=index.stash.<PID> git status after GIT_INDEX_FILE=index.stash.<PID> git read-tree HEAD is as expensive as git add--interactive. Actually add--interactive is a perl script implementing add -p. I don’t know perl and had a hard time reading this, but probably it will somehow check the working dir state and use the same code for it as git status.

  • The basic idea seams to be:

    • Create a temporary index from HEAD
    • Interactive add changes to that index
    • Save the changed temporary index to a tree-ish
  • The expensive part seams to be to get the state of the working dir w.r.t the temporary index. Why it’s so expensive I don’t know. Probably there is some cached data invalidated and it has to read all the files in the working copy at least to some amount to compare with the temporary index, but to understand this one has to dive deeper into the internals of git status.

I tried measuring this like this:

GIT_INDEX_FILE=.git/index.stash.test git read-tree HEAD
GIT_TRACE_PERFORMANCE=/tmp/trace_status GIT_INDEX_FILE=.git/index.stash.test git st .

Result looks like this:

20:31:20.439868 read-cache.c:2290       performance: 0.000269090 s:  read cache .git/index.stash.test
20:31:20.441368 preload-index.c:147     performance: 0.001419629 s:   preload index
20:32:15.568433 read-cache.c:1605       performance: 55.128484420 s:  refresh index
20:32:15.568611 diff-lib.c:251          performance: 0.000054503 s:  diff-files
20:32:15.568847 unpack-trees.c:1546     performance: 0.000004362 s:    traverse_trees
20:32:15.568868 unpack-trees.c:447      performance: 0.000008189 s:    check_updates
20:32:15.568874 unpack-trees.c:1643     performance: 0.000040807 s:   unpack_trees
20:32:15.568879 diff-lib.c:537          performance: 0.000079322 s:  diff-index
20:32:15.569115 name-hash.c:600         performance: 0.000197074 s:   initialize name hash
20:32:15.573785 dir.c:2326              performance: 0.004883714 s:  read directory 
20:32:15.574904 read-cache.c:3017       performance: 0.001083674 s:  write index, changed mask = 82
20:32:15.575125 trace.c:475             performance: 55.135763475 s: git command: /usr/lib/git-core/git status .
20:32:15.575421 trace.c:475             performance: 55.136831211 s: git command: git st .

My repo looks like this:

>$ du -hd 1
1,1M    ./.idea
74M     ./code
3,0G    ./.git
2,4G    ./test-data
5,5G    .

Similar picture if trace directly applied to git stash -p:

20:43:55.968088 read-cache.c:1605       performance: 59.716998605 s:  refresh index
20:43:55.969584 trace.c:475             performance: 59.719061140 s: git command: git update-index --refresh

Man page for git update-index --refresh states:

USING --REFRESH
       --refresh does not calculate a new sha1 file or bring the index up to date for mode/content changes. But what it does do is to "re-match" the stat information of a file with the index, so that you can refresh the index for a
       file that hasn’t been changed but where the stat entry is out of date.

       For example, you’d want to do this after doing a git read-tree, to link up the stat index details with the proper files.
like image 42
Peter Avatar answered Oct 23 '22 15:10

Peter