Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git repo: internal and open source external branches

What is the best way to set up a git repo for a project that your company uses internally, but you also want to open-source (but with a potentially modified history)?

Let's say Acme company has a repo "supercoolproject". They want to open source it, but they don't actually want the company name associated with it at all. They set up a GitHub account under one of their developer's names (or a group, etc), and create the repo. They clone this to an internal Acme server. Nowhere is "Acme" mentioned.

Now comes the problem - in any given organization there are developers who understand open source and are authorized to push some code public. There are others who don't understand all the nuances. When one of these makes a commit, perhaps they include the company name or some other proprietary information. Or, they just make a horrible commit that can be reverted internally (not rewriting history - I'm just talking about adding a "revert" commit). But, you don't want those proprietary commits going out into the open source branches.

So, you create "acme_internal_{dev,qa,production}" branches, and an external "master" branch (and maybe others). What's the best way to keep those in sync? You want to accept commits on the open source repos. And you want to push (most of) your internal commits out. But there are some that shouldn't go out.

It seems that merging internal -> external is a bad thing because you can't remove the bad commits. Rebasing the external branches on the internal ones could be done, but it seems that as soon as you "git rebase -i acme/acme_internal_dev" one time and modify history (change commit messages, remove commits, etc) you can no longer rebase because the two histories diverge. So, do you end up cherry-picking all internal commits out to the public branch and then merging the public branch into the internal tree? That seems ugly too because you end up with duplicated commits internally (the original, and then the cherry-picked one that went into the external and was merged back into the internal).

For the purpose of this question, let's assume that internally Acme wants to avoid rewriting history (actually removing/modifying the bad commmits) on their internal branches.

like image 660
Jeremy Thomerson Avatar asked Oct 08 '22 15:10

Jeremy Thomerson


2 Answers

The solution is to have a tightly controlled "externaly-visible" repository which can only be committed to by those developers who have permission to push up to github.

Code from the internal repository only makes it into the external-visible repository by it being integrated and merged by the permitted developers. In short, non-permitted developers have to submit their code to the permitted developers via patch files or pull requests against the public repo.

Yes, that means the people with permission will have to review and integrate every patch. But, since you don't trust the non-permitted developers, you want them doing that anyhow.

like image 41
wadesworld Avatar answered Oct 12 '22 10:10

wadesworld


There are a few measures you can take to leverage the DVCS nature of that dual repo you want to maintain.


First, never directly expose an internal repo to the world (with the idea of having an "external" branch). There is no such thing than an "external branch", only "external -- or 'public' repos".

A possible setup is to have repo exposed to the world (to which external contributors can push to or pull from).


Second, never push (from within acme) directly to that external repo: mistakes are too easily done, and you don't control at what pace pulls are done. Ie, once you push the wrong stuff, even a swift correction might come to late.

You need an intermediate repo, still managed internally, for review purpose. Ie to inspect what has been pushed, and, if those new commits are ok, pull them from the external repo.
That means the external repo knows about the intermediate repo (it has it listed in its remotes), the reverse is not true (you cannot push by mistake from the internal repo).
That makes for a more explicit publication process (you must go to the external repo server and pull the changes you want to publish, as opposed of staying in a familiar internal environment, and pushing somewhat carelessly)


Make good use, on the intermediate repo (the one where acme's developers can push to for review before publishing), of:

  • pre-receive hooks (to make all kinds of controls: if the commit doesn't meet the criteria for publication, it is rejected and the developer can then rewrite history in his/her own repo).
    Again, rewriting history is acceptable, as long as it is control within acme's developers repos.
  • content filter driver (see for instance this question), in order to not have to version different contents between the two kind of repo for sensitive files (as in "Something like gitignore but not git ignore").
like image 191
VonC Avatar answered Oct 12 '22 10:10

VonC