I have a "fresh" git-svn repo (11.13 GB) that has over a 100,000 objects in it.
I have preformed
git fsck git gc
on the repo after the initial checkout.
I then tried to do a
git status
The time it takes to do a git status is anywhere from 2m25.578s and 2m53.901s
I tested git status by issuing the command
time git status
5 times and all of the times ran between the two times listed above.
I am doing this on a Mac OS X, locally not through a VM.
There is no way it should be taking this long.
Any ideas? Help?
Thanks.
Edit
I have a co-worker sitting right next to me with a comparable box. Less RAM and running Debian with a jfs filesystem. His git status runs in .3 on the same repo (it is also a git-svn checkout).
Also, I recently changed my file permissions (to 777) on this folder and it brought the time down considerably (why, I have no clue). I can now get it done anywhere between 3 and 6 seconds. This is manageable, but still a pain.
Git slowness is generally from large binary files. This isn't because they're binary, just because binary files tend to be large and more complex to compress & diff. Based on your edit indicating the file sizes, I suspect this is your problem.
The first thing to determine is if the poor behavior is due to your machine or to your specific local copy of the repo. The files in your . git folder can affect performance in various ways - settings in . git/config , presence of lfs files, commits that can be garbage collected, etc.
The git fsck command checks the connectivity and validity of objects in the git repository. Using this command, users can confirm the integrity of the files in their repository and identify any corrupted objects.
It came down to a couple of items that I can see right now.
git gc --aggressive
777
There has to be something else going on, but this was the things that clearly made the biggest impact.
git status has to look at every file in the repository every time. You can tell it to stop looking at trees that you aren't working on with
git update-index --assume-unchanged <trees to skip>
source
From the manpage:
When these flags are specified, the object names recorded for the paths are not updated. Instead, these options set and unset the "assume unchanged" bit for the paths. When the "assume unchanged" bit is on, git stops checking the working tree files for possible modifications, so you need to manually unset the bit to tell git when you change the working tree file. This is sometimes helpful when working with a big project on a filesystem that has very slow lstat(2) system call (e.g. cifs).
This option can be also used as a coarse file-level mechanism to ignore uncommitted changes in tracked files (akin to what .gitignore does for untracked files). Git will fail (gracefully) in case it needs to modify this file in the index e.g. when merging in a commit; thus, in case the assumed-untracked file is changed upstream, you will need to handle the situation manually.
Many operations in git depend on your filesystem to have an efficient lstat(2) implementation, so that st_mtime information for working tree files can be cheaply checked to see if the file contents have changed from the version recorded in the index file. Unfortunately, some filesystems have inefficient lstat(2). If your filesystem is one of them, you can set "assume unchanged" bit to paths you have not changed to cause git not to do this check. Note that setting this bit on a path does not mean git will check the contents of the file to see if it has changed — it makes git to omit any checking and assume it has not changed. When you make changes to working tree files, you have to explicitly tell git about it by dropping "assume unchanged" bit, either before or after you modify them.
...
In order to set "assume unchanged" bit, use --assume-unchanged option. To unset, use --no-assume-unchanged.
The command looks at core.ignorestat configuration variable. When this is true, paths updated with git update-index paths… and paths updated with other git commands that update both index and working tree (e.g. git apply --index, git checkout-index -u, and git read-tree -u) are automatically marked as "assume unchanged". Note that "assume unchanged" bit is not set if git update-index --refresh finds the working tree file matches the index (use git update-index --really-refresh if you want to mark them as "assume unchanged").
Now, clearly, this solution is only going to work if there are parts of the repo that you can conveniently ignore. I work on a project of similar size, and there are definitely large trees that I don't need to check on a regular basis. The semantics of git-status make it a generally O(n) problem (n in number of files). You need domain specific optimizations to do better than that.
Note that if you work in a stitching pattern, that is, if you integrate changes from upstream by merge instead of rebase, then this solution becomes less convenient, because a change to an --assume-unchanged object merging in from upstream becomes a merge conflict. You can avoid this problem with a rebasing workflow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With