Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

To monitor why git add . slow?

Tags:

git

Assume project where no add and commit has been done for a long time. I do git add . but it takes too much time. I would like to estimate which files/directories are most expensive in the current case. I have a good .gitignore file which works sufficiently but, still sometimes, I have too much and/or something too difficult to be added and committed to Git.

I have often directories which size is from 300GB to 2 TB in my directories. Although excluding them by directory/* and directory/ in .gitignore, the addition is slow.

How can you estimate which directories/files are too expensive to be committed?

like image 277
Léo Léopold Hertz 준영 Avatar asked Aug 09 '15 12:08

Léo Léopold Hertz 준영


People also ask

What happens when you run git add?

git add. The git add command adds a change in the working directory to the staging area. It tells Git that you want to include updates to a particular file in the next commit. However, git add doesn't really affect the repository in any significant way—changes are not actually recorded until you run git commit .

Should I always use git add?

The git add command adds new or changed files in your working directory to the Git staging area. git add is an important command - without it, no git commit would ever do anything. Sometimes, git add can have a reputation for being an unnecessary step in development.

Does git add track files?

Remember that each file in your working directory can be in one of two states: tracked or untracked. Tracked files are files that were in the last snapshot, as well as any newly staged files; they can be unmodified, modified, or staged. In short, tracked files are files that Git knows about.

Does git commit take a long time?

Expected behavior: commit in the usual timeframe, usually under 10 seconds, even for large commits. Actual behavior: commit takes 5 minutes or more.


2 Answers

Git slowness is generally from large binary files. This isn't because they're binary, just because binary files tend to be large and more complex to compress & diff.

Based on your edit indicating the file sizes, I suspect this is your problem.

The answers to this question offer a few solutions: removing them from source control, manually running git gc, etc.

like image 101
Aaron Brager Avatar answered Oct 24 '22 14:10

Aaron Brager


"git add" needs to internally run "diff-files" equivalent,

With Git 2.20 (Q4 2018), the codepath learned the same optimization as "diff-files" has to run lstat(2) in parallel to find which paths have been updated in the working tree.

See commit d1664e7 (02 Nov 2018) by Ben Peart (benpeart).
(Merged by Junio C Hamano -- gitster -- in commit 9235a6c, 13 Nov 2018)

add: speed up cmd_add() by utilizing read_cache_preload()

During an "add", a call is made to run_diff_files() which calls check_removed() for each index-entry.
The preload_index() code distributes some of the costs across multiple threads.

Because the files checked are restricted to pathspec, adding individual files makes no measurable impact but on a Windows repo with ~200K files, 'git add .' drops from 6.3 seconds to 3.3 seconds for a 47% savings.

like image 25
VonC Avatar answered Oct 24 '22 13:10

VonC