Assume project where no add and commit has been done for a long time.
I do git add . but it takes too much time.
I would like to estimate which files/directories are most expensive in the current case.
I have a good .gitignore file which works sufficiently but, still sometimes, I have too much and/or something too difficult to be added and committed to Git.
I have often directories which size is from 300GB to 2 TB in my directories.
Although excluding them by directory/* and directory/ in .gitignore, the addition is slow.
How can you estimate which directories/files are too expensive to be committed?
git add. The git add command adds a change in the working directory to the staging area. It tells Git that you want to include updates to a particular file in the next commit. However, git add doesn't really affect the repository in any significant way—changes are not actually recorded until you run git commit .
The git add command adds new or changed files in your working directory to the Git staging area. git add is an important command - without it, no git commit would ever do anything. Sometimes, git add can have a reputation for being an unnecessary step in development.
Remember that each file in your working directory can be in one of two states: tracked or untracked. Tracked files are files that were in the last snapshot, as well as any newly staged files; they can be unmodified, modified, or staged. In short, tracked files are files that Git knows about.
Expected behavior: commit in the usual timeframe, usually under 10 seconds, even for large commits. Actual behavior: commit takes 5 minutes or more.
Git slowness is generally from large binary files. This isn't because they're binary, just because binary files tend to be large and more complex to compress & diff.
Based on your edit indicating the file sizes, I suspect this is your problem.
The answers to this question offer a few solutions: removing them from source control, manually running git gc, etc.
"git add" needs to internally run "diff-files" equivalent,
With Git 2.20 (Q4 2018), the codepath learned the same optimization as "diff-files" has to run lstat(2) in parallel to find which paths have been updated in the working tree.
See commit d1664e7 (02 Nov 2018) by Ben Peart (benpeart).
(Merged by Junio C Hamano -- gitster -- in commit 9235a6c, 13 Nov 2018)
add: speed upcmd_add()by utilizingread_cache_preload()During an "
add", a call is made torun_diff_files()which callscheck_removed()for each index-entry.
Thepreload_index()code distributes some of the costs across multiple threads.Because the files checked are restricted to pathspec, adding individual files makes no measurable impact but on a Windows repo with ~200K files, '
git add .' drops from 6.3 seconds to 3.3 seconds for a 47% savings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With