We have a reasonably large, and far too messy code base that we wish to migrate to using Git. At the moment, it's a big monolithic chunk that can't easily be split into smaller independent components. The code builds a large number of shared libraries, but their source code is so interleaved that it can't be cleanly separated into separate repositories at the moment.
I'm not too concerned with whether Git can cope with having all the code in a single repository, but a problem is that we need to version both the source code and many of the libraries built from it. Building everything from scratch takes hours, so when checking out the code, developers should also get precompiled versions of these libraries to save time.
And this is where I could use some advice. The libraries don't need to be 100% up to date (as they generally maintain binary compatibility, and can always be rebuilt by the individual developer if necessary), so I'm looking for ways to to avoid cluttering up our source code repository with countless marginally different versions of binary files which can be regenerated from the source anyway, while still making the libraries easily accessible to developers so they don't have to rebuild everything from scratch.
So I'd like some way to achieve something like the following.
git commit -a
shouldn't end up accidentally polluting the repository with a new revision of all these generated files)Of course, at the same time, the process of using these should be as smooth as possible. When checking out the source, the libraries built from it should follow (or at least, be easy to get). And when committing, it shouldn't be possible to accidentally commit new versions of these libraries, just because they were recompiled and now have a different timestamp embedded.
I've been looking at the option of using git's submodules, creating the "super" repository containing the source code, and then one or more submodules for the generated libraries, but so far, it seems a bit too clumsy and fragile for my taste. It seems that they don't actually prevent the developer from committing changes directly to the submodule, it just causes things to break further down the line (while playing around with submodules, I've ended up with more detached HEAD
s than I care to count).
Considering virtually all our developers are new to Git, that may end up wasting more time than it saves us.
So what are our options? Does the submodule approach sound sensible to you Git gurus out there? And how do I "tame" it, so it's as easy to use (and hard to mess up) as possible for our developers?
Or is there an entirely different solution we haven't considered?
I should mention that I've only used Git for a couple of days, so I'm pretty much a newbie myself.
The ideal solution is to avoid versioning binaries and store them in an artifact repository like Nexus.
The issue with deliveries in a VCS is that a VCS is design to record and keep the history of all files it manages, whereas:
I would keep these in a separate repository to the source files. You can use 'git submodules' to keep a reference between the two; so the 'compiled libs' becomes the parent and the source becomes the submodule. That way, when you commit the libs you commit a reference to the exact point of the source code at the time.
Further, since developers don't need the full history, you can use git clone --depth 1 libs.git which gives you only the latest version of the libs. It doesn't pull further history, and doesn't allow you to commit (which is OK since the server should be doing that for you) and you'll give them access to the latest versions (or whatever branch you specify on the clone command with -b).
Ideally you don't want the main git repository containing, or pointing to, the binary repository.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With