We have a large c++ repository with size of 80 GB with nearly 200,000 files, containing multiple components.
The libraries (archives) are common for more number of components with tightly coupled.
With this all git operations and the compilation/building a particular component is taking too long time.
Please suggest me how to how to divide this single repo into multiple repos.
First, 200000 source files are likely to take less than 80GB of space (unless each file represents a 400KB of source!)
Update 2015: git-lts can actually manage that kind of volume.
See "Efficient storage of binary files in a git repository".
Original answer (2013)
That means:
Second, git operations are only slow if we are talking about one huge repo.
git is done to manage multiple small repos (even the git Linux kernel repo is nowhere near the size and number of files you mention)
So you need:
to split the huge git repo around:
speed up the compilation process, especially when doing unit or small integration tests, by using binary dependencies: instead of getting all the sources and recompiling everything, you could setup each project in order for them to use the binaries/exes produced the other projects in order for a specific project to compile and run.
That depends on how tightly coupled your libraries are with the other components.
The OP user2463892 adds in the comments:
I heard some thing about GIT submodules which will helps in dividing or splitting the large code base.
I am not familiarized with this, Can any one help me understand few of my questions regarding this as below?1) How git submodule works? will it divide the huge code into multiple repos? with this can we solve the problem of GIT slowness?
A submodule is a git repo declared within another repo (which becomes a "parent" repo).
The parent repo has a fixed know reference to a submodule repo as a special entry, which means:
when you clone a parent repo, you don't clone by default all the submodules declared in it
And that could be interesting in your case, as you don't need to clone all the sources in order to make the kind of incremental compilation you mention.
Plus, multiple repos means smaller repos, with commands like checkout
, log
, diff
and status
going faster.
2) Assume we divided the main repo into multiple repos by using this submodules... will this solve the problem which we faced (dependency between repos)?
Example: Assume we devide the main core repo into
Super
,RepoA
,RepoB
,RepoC
etc...
Then will it be possible to compile all these repos together?
CanRepoA
access the library from other repos (Super
,RepoB
,RepoC
etc) and vice versa?
The mutual dependencies will still be there, but you would be able:
repoB
or repoC
to use. The goal is to switch from a source-only dependency to a (generated) binary dependency, where repoB
can be compiled based on the binaries produced by repoA
compilation step.
You can create repositories for folders in Github using the following command.
git filter-branch --prune-empty --subdirectory-filter foldername master
This assumes you have already identified which components to extract and you sorted out the build processes once the repositories were created.
Reference:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With