Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing generated files in Git

We have a reasonably large, and far too messy code base that we wish to migrate to using Git. At the moment, it's a big monolithic chunk that can't easily be split into smaller independent components. The code builds a large number of shared libraries, but their source code is so interleaved that it can't be cleanly separated into separate repositories at the moment.

I'm not too concerned with whether Git can cope with having all the code in a single repository, but a problem is that we need to version both the source code and many of the libraries built from it. Building everything from scratch takes hours, so when checking out the code, developers should also get precompiled versions of these libraries to save time.

And this is where I could use some advice. The libraries don't need to be 100% up to date (as they generally maintain binary compatibility, and can always be rebuilt by the individual developer if necessary), so I'm looking for ways to to avoid cluttering up our source code repository with countless marginally different versions of binary files which can be regenerated from the source anyway, while still making the libraries easily accessible to developers so they don't have to rebuild everything from scratch.

So I'd like some way to achieve something like the following.

  • the libraries are generated by our build server on a regular basis, which could then commit them to the Git repository. The developers should then treat these files as read-only (pull the latest version, and when necessary, rebuild in place, but don't commit new versions), and ideally, Git should enforce this. (In particular, a developer running a quick git commit -a shouldn't end up accidentally polluting the repository with a new revision of all these generated files)
  • keep these files in a separate repository, so the source code won't have to carry around all these generated binary files perpetually (since they're a convenience to cut down on compilation time, but they're not actually necessary).

Of course, at the same time, the process of using these should be as smooth as possible. When checking out the source, the libraries built from it should follow (or at least, be easy to get). And when committing, it shouldn't be possible to accidentally commit new versions of these libraries, just because they were recompiled and now have a different timestamp embedded.

I've been looking at the option of using git's submodules, creating the "super" repository containing the source code, and then one or more submodules for the generated libraries, but so far, it seems a bit too clumsy and fragile for my taste. It seems that they don't actually prevent the developer from committing changes directly to the submodule, it just causes things to break further down the line (while playing around with submodules, I've ended up with more detached HEADs than I care to count).

Considering virtually all our developers are new to Git, that may end up wasting more time than it saves us.

So what are our options? Does the submodule approach sound sensible to you Git gurus out there? And how do I "tame" it, so it's as easy to use (and hard to mess up) as possible for our developers?

Or is there an entirely different solution we haven't considered?

I should mention that I've only used Git for a couple of days, so I'm pretty much a newbie myself.

like image 557
jalf Avatar asked Apr 12 '11 08:04

jalf


2 Answers

The ideal solution is to avoid versioning binaries and store them in an artifact repository like Nexus.

The issue with deliveries in a VCS is that a VCS is design to record and keep the history of all files it manages, whereas:

  • many versions of a delivery are intermediate builds that will need to be cleaned up at one point or another
  • cleaning (removing old versions) is quite hard to do in a VCS, very easy to do in an artifact repository.
  • the size of a repo will become an issue (especially for a DVCS, unless you always get the latest version, in which case a shallow clone might alleviate that issue)
  • there is no way of comparing a version of a binary with another (so "versioning" don't make a lot of sense)
like image 140
VonC Avatar answered Oct 05 '22 08:10

VonC


I would keep these in a separate repository to the source files. You can use 'git submodules' to keep a reference between the two; so the 'compiled libs' becomes the parent and the source becomes the submodule. That way, when you commit the libs you commit a reference to the exact point of the source code at the time.

Further, since developers don't need the full history, you can use git clone --depth 1 libs.git which gives you only the latest version of the libs. It doesn't pull further history, and doesn't allow you to commit (which is OK since the server should be doing that for you) and you'll give them access to the latest versions (or whatever branch you specify on the clone command with -b).

Ideally you don't want the main git repository containing, or pointing to, the binary repository.

like image 40
AlBlue Avatar answered Oct 05 '22 08:10

AlBlue