I have a "fresh" git-svn repo (11.13 GB) that has over a 100,000 objects in it. I have preformed <pre class="prettyprint"><code>git fsck git gc </code></pre> on the repo after the initial checkout. I then tried to do a <pre class="prettyprint"><code>git status </code></pre> The time it takes to do a git status is anywhere from 2m25.578s and 2m53.901s I tested git status by issuing the command <pre class="prettyprint"><code>time git status </code></pre> 5 times and all of the times ran between the two times listed above. I am doing this on a Mac OS X, locally not through a VM. There is no way it should be taking this long. Any ideas? Help? Thanks. Edit I have a co-worker sitting right next to me with a comparable box. Less RAM and running Debian with a jfs filesystem. His git status runs in .3 on the same repo (it is also a git-svn checkout). Also, I recently changed my file permissions (to 777) on this folder and it brought the time down considerably (why, I have no clue). I can now get it done anywhere between 3 and 6 seconds. This is manageable, but still a pain.

It came down to a couple of items that I can see right now. <ol> <li><code>git gc --aggressive</code></li> <li>Opening up file permissions to <code>777</code> </li> </ol> There has to be something else going on, but this was the things that clearly made the biggest impact.

Git is really slow for 100,000 objects. Any fixes?

Tags:

performance

git

git-svn

I have a "fresh" git-svn repo (11.13 GB) that has over a 100,000 objects in it.

I have preformed

git fsck git gc

on the repo after the initial checkout.

I then tried to do a

git status

The time it takes to do a git status is anywhere from 2m25.578s and 2m53.901s

I tested git status by issuing the command

time git status

5 times and all of the times ran between the two times listed above.

I am doing this on a Mac OS X, locally not through a VM.

There is no way it should be taking this long.

Any ideas? Help?

Thanks.

Edit

I have a co-worker sitting right next to me with a comparable box. Less RAM and running Debian with a jfs filesystem. His git status runs in .3 on the same repo (it is also a git-svn checkout).

Also, I recently changed my file permissions (to 777) on this folder and it brought the time down considerably (why, I have no clue). I can now get it done anywhere between 3 and 6 seconds. This is manageable, but still a pain.

266

asked Jul 22 '10 22:07

manumoomoo

2 Answers

It came down to a couple of items that I can see right now.

git gc --aggressive
Opening up file permissions to 777

There has to be something else going on, but this was the things that clearly made the biggest impact.

113

answered Oct 21 '22 22:10

manumoomoo

git status has to look at every file in the repository every time. You can tell it to stop looking at trees that you aren't working on with

git update-index --assume-unchanged <trees to skip>

source

From the manpage:

When these flags are specified, the object names recorded for the paths are not updated. Instead, these options set and unset the "assume unchanged" bit for the paths. When the "assume unchanged" bit is on, git stops checking the working tree files for possible modifications, so you need to manually unset the bit to tell git when you change the working tree file. This is sometimes helpful when working with a big project on a filesystem that has very slow lstat(2) system call (e.g. cifs).

This option can be also used as a coarse file-level mechanism to ignore uncommitted changes in tracked files (akin to what .gitignore does for untracked files). Git will fail (gracefully) in case it needs to modify this file in the index e.g. when merging in a commit; thus, in case the assumed-untracked file is changed upstream, you will need to handle the situation manually.

Many operations in git depend on your filesystem to have an efficient lstat(2) implementation, so that st_mtime information for working tree files can be cheaply checked to see if the file contents have changed from the version recorded in the index file. Unfortunately, some filesystems have inefficient lstat(2). If your filesystem is one of them, you can set "assume unchanged" bit to paths you have not changed to cause git not to do this check. Note that setting this bit on a path does not mean git will check the contents of the file to see if it has changed — it makes git to omit any checking and assume it has not changed. When you make changes to working tree files, you have to explicitly tell git about it by dropping "assume unchanged" bit, either before or after you modify them.

...

In order to set "assume unchanged" bit, use --assume-unchanged option. To unset, use --no-assume-unchanged.

The command looks at core.ignorestat configuration variable. When this is true, paths updated with git update-index paths… and paths updated with other git commands that update both index and working tree (e.g. git apply --index, git checkout-index -u, and git read-tree -u) are automatically marked as "assume unchanged". Note that "assume unchanged" bit is not set if git update-index --refresh finds the working tree file matches the index (use git update-index --really-refresh if you want to mark them as "assume unchanged").

Now, clearly, this solution is only going to work if there are parts of the repo that you can conveniently ignore. I work on a project of similar size, and there are definitely large trees that I don't need to check on a regular basis. The semantics of git-status make it a generally O(n) problem (n in number of files). You need domain specific optimizations to do better than that.

Note that if you work in a stitching pattern, that is, if you integrate changes from upstream by merge instead of rebase, then this solution becomes less convenient, because a change to an --assume-unchanged object merging in from upstream becomes a merge conflict. You can avoid this problem with a rebasing workflow.

answered Oct 21 '22 21:10

masonk

Related questions
                            
                                JavaScript loop performance - Why is to decrement the iterator toward 0 faster than incrementing
                            
                                MySQL: Many tables or many databases?
                            
                                Are C++ enums slower to use than integers?
                            
                                boost serialization vs google protocol buffers? [closed]
                            
                                In what way does denormalization improve database performance?
                            
                                Why is Solr so much faster than Postgres?
                            
                                Java 8's streams: why parallel stream is slower?
                            
                                Hash table - why is it faster than arrays?
                            
                                Speed tradeoff of Java's -Xms and -Xmx options
                            
                                compareTo with primitives -> Integer / int
                            
                                Client-side logic OR Server-side logic?
                            
                                Does python logging flush every log?
                            
                                Comparison between RabbitMQ and MSMQ
                            
                                Improving performance of very large dictionary in Python
                            
                                What are the performance implications of marking methods / properties as virtual?
                            
                                Performance hit from C++ style casts?
                            
                                What is the fastest way to swap values in C?
                            
                                What is the fastest way to truncate timestamps to 5 minutes in Postgres?
                            
                                Fastest way to get the integer part of sqrt(n)?
                            
                                Array Join vs String Concat

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Git is really slow for 100,000 objects. Any fixes?

Tags:

performance

git

git-svn

manumoomoo

People also ask

2 Answers

manumoomoo

masonk

Recent Activity

Donate For Us