WHAT operations become slow when git repos become large, and WHY?

Tags:

git

This question was asked in various forms on SO and elsewhere, but no answer I was able to find has satisfied me, because none list the problematic/non problematic actions/commands, and none give a through explanation of the technical reason for the speed hit.

For instance:

Why can't Git handle large files and large repos
Why git operations becomes slow when repo gets bigger
Git is really slow for 100,000 objects. Any fixes?

So, I am forced to ask again:

Of the basic git actions (commit, push, pull, add, fetch, branch, merge, checkout), which actions become slower when repos become larger (NOTICE: repos, not files for this question)

And,

Why each action depends on repo size (or doesn't)?

I don't care right now about how to fix that. I only care about which actions' performance gets hit, and the reasoning according to current git architecture.

Edit for clarification:

It is obvious that git clone for instance, would be o(n) the size of the repo.

However it is not clear to me that git pull would be the same, because it is theoretically possible to only look at differences.

Git does some non trivial stuff behind the scenes, and I am not sure when and which.

Edit2:

I found this article, stating

If you have large, undiffable files in your repo such as binaries, you will keep a full copy of that file in your repo every time you commit a change to the file. If many versions of these files exist in your repo, they will dramatically increase the time to checkout, branch, fetch, and clone your code.

I don't see why branching should take more than O(1) time, and I am also not sure the list is full. (for example, what about pulling?)

431

asked Jul 21 '19 15:07

Gulzar

Video Answer

2 Answers

However it is not clear to me that git pull would be the same, because it is theoretically possible to only look at differences.

Since Git 2.23 (Q3 2019), it is not O(N), but O(n log(N)): see "Git fetch a branch once with a normal name, and once with capital letter".

The main issue is the log graph traversal, checking what we have and have not (or computing forced update status).
That is why, for large repositories, recent Git editions have introduced:

reachability bitmap,
commit graph,
loose cache,
Commit Graphs Chains.
And pack-file tree discovery for push commands.

they will dramatically increase the time to checkout, branch, fetch, and clone

That won't be because of operation being not O(1).
It has to do with the size of the large number of binaries to transfert/copy around when doing those operations.
Creating a new branch remains very fast, but switching to it when you have to update those binary files can be slow, simply from an i/o perspective (copy/update/delete large files).

answered Oct 21 '22 07:10

VonC

I see two major issues which you have opened for discussion. First, you are asking about which Git operations get slower as repos get larger. The answer is, most Git operations will get slower as the repo gets larger. But the operations which would make Git seem noticeably slower are those which involve interacting with the remote repository. It should be intuitive to you that if the repo bloats, then things like cloning, pulling, and pushing would take longer.

The other issue you have touched on concerns whether or not large binary files should even be committed in the first place. When you make a commit, a copy of each file in the commit is compressed and added to the tree. Binary files have a tendency to not compress well. As a result, adding large binary files can over time cause your repo to bloat. In fact, many teams will configure their remote (e.g. GitHub) to block any such commits containing large binaries.

answered Oct 21 '22 07:10

Tim Biegeleisen

Related questions
                            
                                How to get Git on Windows to ignore symbolic links
                            
                                Ignore .git folder in sub folder
                            
                                Get files modified/added/removed from a commit in LibGit2Sharp
                            
                                read-only git mirror of an svn repository
                            
                                PHP deployment using Git. How can I make it more automated?
                            
                                Eclipse shortcut for Compare With Head Revision
                            
                                List last commit dates for a large number of files, quickly
                            
                                Updating project version number on git push
                            
                                Pausing file watchers until git checkout complete
                            
                                Show top n most active committers in a git repository
                            
                                How does git flow handle hotfix to older release or point release of older release
                            
                                Running pylint against only changed lines/files with jenkins
                            
                                In bitbucket/git, what does "xx commits behind proto. Sync Now" do?
                            
                                Need to handle git-submodules in git-archive
                            
                                git -- locking master branch for some users?
                            
                                Git submodules — exclude specific files/directories
                            
                                Git-backed ORM for Python?
                            
                                Testing what is about to be committed in a pre-commit hook
                            
                                when not to use "git rebase" while working with GIT branches?
                            
                                Git add specific lines not interactively

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With