When to break up a large Git repository into smaller ones?

Tags:

I am working on doing a migration from SVN to Git. I have already used git-svn to get the history into a single git repository, and I already know how to use git-subtree to split that repository into smaller ones. This question is not about how to do the migration, it is about when to split and when not to split.

I want to split the large repository because some of the directories are self-contained libraries that are also shared with other projects. Previously an svn checkout was done on the library without the need to checkout the entire project. During all of this I discovered that there are probably dozens of directories that make sense to be in their own repository because they are 1) independent and 2) shared across projects.

Once you get above a handful of git repositories, it seems prudent to use a tool that makes working with many repositories easier. Some examples are Google's repo, git submodules, git subtree, and creating a custom script (it appears that chromium does this). I have explored these various methods, and understand how to use them.

So the question is about direction for the transition from subversion.

Should I try and stick to one large git repository, only splitting it into smaller pieces when absolutely necessary or should I split it into dozens or potentially hundreds of smaller repositories? Which would be easier to work with? Is there another solution that I have missed? If going with the many repositories, which tool should I use? What factors will make someone favor one method over another?

Note: The source needs to be checked out on Windows, MacOS, and Linux.

254

asked Feb 21 '14 17:02

onionjake

1 Answers

That process can be guided by a component approach, where you identified coherent set of files (an application, a project, a library)

In term of history (in a source control tool), a coherent set means it will be labelled, branched or merged as a all, independently of the other set of files.

For a distributed version control system (like git), each of those set of files is a good candidate for a git repo of its own, and you can then group those you need for a specific project in a parent repo with submodules.

I describe this approach for instance in;

"Git repository setup for a project that has a server and client" (server and client being two obvious coherent separate sets which benefit from having their own repo)
"What is Component-Driven Development?"

The opposite (keeping everything in one repo) is called "system-based approach", but can lead to huge Git repo, which, as I mentioned in "Performance for Git", isn't compatible with how Git is implemented.

The OP onionjake asks in the comments:

Could you please include more information on the subtleties of identifying components?

This process (of identifying "components", which in turn become git repos) is guide by the software architecture of your system.
Any subset which acts as an independent set of file is a good candidate for its own repo. It can be a library, or dll, but also part of an application (a GUI, a client vs. a server, a dispatcher, ...)

Each time you identify a group of tightly linked files (meaning modifying one will likely have effect to others), there should be part of the component, or in git, the same repo.

176

answered Sep 21 '22 14:09

VonC

Related questions
                            
                                Why does git keep messing with my line endings?
                            
                                Why did git push so much data?
                            
                                git rebase branch with all subbranches
                            
                                Will git-rm --cached delete another user's working tree files when they pull
                            
                                Best practices for using Git with Magento?
                            
                                How to force a common ancestor in a git merge?
                            
                                Gerrit - how to disallow direct push to "master" but allow to other branches
                            
                                Follow renames when performing git subtree split
                            
                                Colorize files in SublimeText tree-view based on git status?
                            
                                What should be in my gitignore for Xcode 7 [duplicate]
                            
                                git: "Updates were rejected because the tip of your current branch is behind.." but how to see differences?
                            
                                Share publically a part of a private repo on Github
                            
                                What data is being signed when you `git commit --gpg-sign=<key-id>`?
                            
                                Compare current file to historical version in VS2015 TFS Git Source Control
                            
                                How to install git hooks on "npm install"?
                            
                                Prohibit remote pushing to the master branch in git
                            
                                Is it possible to keep an unversioned file in a git repository
                            
                                Merging without changing the working directory
                            
                                Merge error after converting Git submodule to subtree
                            
                                Can plunker save to github?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When to break up a large Git repository into smaller ones?

Tags:

git

version-control

repository

git-submodules

git-subtree

onionjake

People also ask

1 Answers

VonC

Recent Activity

Donate For Us