Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Monolithic vs multiple Mercurial repos for modules within a suite of related applications

Tags:

mercurial

At my organization, we'd like to switch from using CVS to Mercurial for a variety of reasons. We've done a lot of investigation when trying to determine how we should organize our Hg repositories based on what we have in our codebase and how we tend to work. We've come up with satisfactory answers to most of our questions, but there's one point that's stumping us a little, and it's driving me mad because the most convenient way to organize the repo for our workflow just seems like the wrong way to go from a conceptual standpoint. I'm trying to figure out whether our perception of how this is "supposed" to work is wrong, or whether we're just bumping up against a legitimate feature gap in the available tooling.

Primarily we maintain a medium sized codebase consisting of a suite of applications that get all get released in the same package. Conceptually you can divide our code into three categories:

  1. Shared code
  2. Application code for our primary suite (uses the shared code)
  3. Miscellaneous small utilities that are infrequently maintained (uses the shared code)

This doesn't seem unusual to me, but I want to stress the point that we maintain the application code and the shared code at the same time and always want them to be bleeding edge with respect to each other. That is, we want all our application builds to always use the latest version and the same version of the shared code. We frequently add to or modify application code and shared code at the same time. Currently, the shared code is in one CVS module, and the applications are all in their own separate modules. The shared code and application modules are checked out such that the shared code gets built once and then linked into each application. We frequently do cvs commits that include changes across shared and application modules at once. We would really like to keep that ability.

I understand that commits in Hg are atomic within repositories -- that's fine but I'd like to be able to diff and commit to an application and a shared library at the "same time" (i.e. I don't care if it's really atomic but I don't want to have to manually do two separate diffs and two separate commit actions).

Conceptually, it seems like it would be correct to have one or a few repos for the shared code, and a separate repo for each application and each little utility program. This means you'd need to check out multiple repos for each build but that isn't a problem for us. The problem is there doesn't seem to be any tooling that lets you view updates or changes on multiple repos at once, or diff multiple repos at once and then sequentially commit them for you. This would be easy to script, but that wouldn't help developers who want to use various GUI frontends to complement the command line.

It seems like in order to be able to commit across multiple codebases at once (from a user's perspective) and keep everything on the bleeding edge together, the only two solutions are:

  1. Use a monolithic repo with EVERYTHING in it.
  2. Create some subrepos but access/commit everything through a big monolithic "main" repo that contains all the subrepos to keep everything on the latest revisions (which doesn't seem any better than (1) to me).

It can't be that unusual to want to work with multiple "peer" repositories at the same time, even if the actions aren't truly atomic across all of them -- and yet I'm not finding tons of articles or posts clamoring for this ability.

In summary:

  1. We would like to organize our code such that we can diff and commit application code and shared code at the same time from the user's perspective (they need not truly be atomic).
  2. It seems like we should be putting application code and shared code in separate repositories.
  3. Subrepositories tie parent repositories to specific revisions, which we do not want.

What am I missing here?

like image 984
rationull Avatar asked Jul 15 '11 23:07

rationull


1 Answers

In my shop, we have many projects that are simply in separate repos, but the main application's repo has 2 other projects in it. One is a module that shares a significant amount of code with the main application, and the other is for database migrations for the application (it's even in a different language). I wanted related changes in both the application and the migrator to be committed together, inseparably. Altogether, all source files in this repo are between 10 and 11 MB.

So if putting everything in one repository is really what makes sense because you don't want to deal with subrepositories, then there's nothing wrong with putting everything in one repository. The one of mine is on the small side of medium, in my opinion. TortoiseHg's source is around 20 MB, OGRE is over 100 MB.

Without knowing more about your projects and their relationships, the impression I get is that a single repository would work just fine, and that you're not looking at this incorrectly.

If you change your mind, hg convert can help you extract projects into their own repository, maintaining the history of those files.

If the one-repository approach is not for you, then I think subrepos should be given a chance, as that is the only other method I know of for treating multiple repos cohesively that is supported in TortoiseHg (see the Recommendations section).

However, I'm not sure how you would deal with the inter-department access, given that it doesn't seem there is an established subset already shared with others.

like image 108
Joel B Fant Avatar answered Nov 13 '22 04:11

Joel B Fant