Assume project where no add and commit has been done for a long time.
I do git add .
but it takes too much time.
I would like to estimate which files/directories are most expensive in the current case.
I have a good .gitignore
file which works sufficiently but, still sometimes, I have too much and/or something too difficult to be added and committed to Git.
I have often directories which size is from 300GB to 2 TB in my directories.
Although excluding them by directory/*
and directory/
in .gitignore
, the addition is slow.
How can you estimate which directories/files are too expensive to be committed?
git add. The git add command adds a change in the working directory to the staging area. It tells Git that you want to include updates to a particular file in the next commit. However, git add doesn't really affect the repository in any significant way—changes are not actually recorded until you run git commit .
The git add command adds new or changed files in your working directory to the Git staging area. git add is an important command - without it, no git commit would ever do anything. Sometimes, git add can have a reputation for being an unnecessary step in development.
Remember that each file in your working directory can be in one of two states: tracked or untracked. Tracked files are files that were in the last snapshot, as well as any newly staged files; they can be unmodified, modified, or staged. In short, tracked files are files that Git knows about.
Expected behavior: commit in the usual timeframe, usually under 10 seconds, even for large commits. Actual behavior: commit takes 5 minutes or more.
Git slowness is generally from large binary files. This isn't because they're binary, just because binary files tend to be large and more complex to compress & diff.
Based on your edit indicating the file sizes, I suspect this is your problem.
The answers to this question offer a few solutions: removing them from source control, manually running git gc
, etc.
"git add
" needs to internally run "diff-files
" equivalent,
With Git 2.20 (Q4 2018), the codepath learned the same optimization as "diff-files
" has to run lstat(2)
in parallel to find which paths have been updated in the working tree.
See commit d1664e7 (02 Nov 2018) by Ben Peart (benpeart
).
(Merged by Junio C Hamano -- gitster
-- in commit 9235a6c, 13 Nov 2018)
add
: speed upcmd_add()
by utilizingread_cache_preload()
During an "
add
", a call is made torun_diff_files()
which callscheck_removed()
for each index-entry.
Thepreload_index()
code distributes some of the costs across multiple threads.Because the files checked are restricted to pathspec, adding individual files makes no measurable impact but on a Windows repo with ~200K files, '
git add .
' drops from 6.3 seconds to 3.3 seconds for a 47% savings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With