Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keeping 2 git repositories synced

I am facing the following problem and am out of ideas:

My company does not allow direct internet access for our developers. Therefore we are in dire need of our own git repository. So far so normal. The project our developers are working on, is supported by an external company that is also developing for us. This company has a git repository of their own. They do not have direct access to our git repo and we do not have direct access to theirs. Access is only provided via a secluded server that is able to reach their repo.

For better understanding:

My companies repo = A , External companies repo = B

Both of these repositories need to be kept in sync. Both have the same branches and a change made in A should be carried over to B and vice versa. Both companies work at all the branches at the same time. I told them to keep disjunct branches to work on, but they did not listen. Anyway...

My solution so far was this piece of script I was able to find here:

$ORIGIN_URL=EXTERNAL REPO B
$REPO1_URL=INTERNAL REPO A

/usr/bin/git clone -c http.sslVerify=false --bare $ORIGIN_URL
/usr/bin/git remote add --mirror=fetch repo1 $REPO1_URL
/usr/bin/git -c http.sslVerify=false fetch --all
/usr/bin/git fetch repo1 --tags
/usr/bin/git push origin --all
/usr/bin/git push origin --tags
/usr/bin/git push repo1 --all
/usr/bin/git push repo1 --tags

The problem is, since both companies work on the same branches (i.e. A/fix1 and B/fix1) I am constantly graced with conflicts (Updates were rejected because a pushed branch tip is behind its remote (non-fast-forward)).

I am trying to find some piece of script, that will solve this problem for me and both companies.

I would even be grateful for some advice on how to resolve this one conflict I am facing over and over again.

Thank you for your help

Regards L.

like image 679
Limboman Avatar asked Mar 05 '23 22:03

Limboman


2 Answers

It sounds like you're thinking that their branch is "the same branch" as your branch if it has the same name. That's not necessarily true. One way to look at it is, git doesn't think about branches in two repos as "the same branch" ever; it just has rules for how it integrates changes between repos. Depending on how you configure those rules, you might think of them as "the same branch".

So the first thing is to configure the rules differently. Actually git's default behavior isn't too bad here; but setting --mirror=fetch on the repo1 remote overrides the default in a way that probably isn't helping. Things are a little simpler if we don't do that. We can also keep things a little simpler by manually adding both remotes instead of cloning one of the repos. (This isn't necessary; I just think it makes what's going on a little clearer.)

git init --bare
git remote add external $ORIGIN_URL
git remtoe add internal $REPO1_URL
git fetch --all

Now supposing each repo had a branch1 and a branch2, and those both diverged, your new repo look slike

       E <--(remotes/external/branch2)
      /
o -- x -- D <--(remotes/internal/branch2)
      \
       x -- A -- B <--(remotes/internal/branch1)
        \
         C <--(remotes/external/branch1)

From here, you can share the external branches to the internal repo without any concern about branch name conflicts by namespacing the brnches.

git push internal refs/remotes/external/*:refs/heads/external/*

Now your internal repo looks like

       E <--(external/branch2)
      /
o -- x -- D <--(branch2)
      \
       x -- A -- B <--(branch1)
        \
         C <--(external/branch1)

Of course the external changes aren't integrated with the internal ones, but that's the same as it would be if they had used different branch names per your original advice. It's expected - at some point someone has to merge external changes into internal branches (or vice versa), and that's when conflicts will have to be resolved.

(You can, of course, use certain practices to make the merge conflict resolution as painless as possible - such as favoring short-lived branches and frequent incremental integrations. But you can't entirely eliminate them.)

You could similarly share the internal changes in un-integrated form with the external repo; e.g. by doing something like

git push external refs/remotes/internal/*:refs/heads/internal/*

But this leaves some questions about who integrates changes and how, especially since it sounds like the external company isn't doing what's asked of them in this regard. So you might want to integrate their changes internally, and then share the integrated changes using the branch names they already know.

The trick to that is, you have to use a "fetch, integrate, push" model to avoid the "non-fast-forward" errors like you're already seeing. When your working clones are able to directly communicate with the remote, this is typically done as

git pull
# resolve conflicts
git push

Because you have to use this bridge repository, and yet probably don't want to do all the integration work at that repo, you have extra steps. And that can be an annoyance, because the longer it takes to complete the fetch/integrate/push cycle, the more chance new changes appear after you fetch but before you push, requiring you to do yet another fetch/integrate/push cycle. Of course pushes are accepted or reject on a ref-by-ref basis, so over time, it should work out (as attempt 1 successfully pushes branch A, and attempt 2 successfully pushes branches B and C, etc.).

So an integration workflow might look like this:

On the bridge repository

fetch --all
git push external refs/origins/internal/*:refs/heads/*

This tries to directly update their branches. Some of the refs may be rejected; that's ok, you'll hope to get them on the next cycle.

git push internal refs/origins/external/*:refs/heads/external/*

This should always succeed. To make sure it always succeeds, you should be sure to never make an internal commit to the external/* branches. For this reason you might want to use a non-branch ref (i.e. keep the external refs outside the refs/heads hierarchy), but it's not entirely clear where you'd put them. You could keep treating them like remote tracking refs

git push internal refs/origins/external/*:refs/origins/external/*

That's a little shady since the internal repo doesn't actually have a remote named external...

Anyway, one way or another your developers can now see the changes and integrate them into the local versions of the branches, resolving conflicts. Then on your next integration cycle when you fetch you'll get the merge commits, which you can try to push to the remote. Repeat as necessary.

Of course this is predicated on "they don't seem to do what they're asked" as regards coordinating internal and external changes. The more you can have everyone using the repo on the same page, the fewer headaches you'll have. (Like in this case, having to do all integration internally, and potentially having delayed external visibility to internal changes.)

In that sense, I like the idea of pushing the internal refs to the external repo and the external refs to the internal repo so that both companies' devs can see both sets of changes. But what you don't want is to have external devs committing to internal branches or vice versa, because then the integrations will start getting weird, with branches like rsfs/heads/internal/external/master or something equally silly.

like image 187
Mark Adelsberger Avatar answered Mar 31 '23 07:03

Mark Adelsberger


To make this all work, you (Company A) and they (Company B) need to have an agreed-upon sharing point. This Git repository clone does not have to be "the master" or "the source of all truth". That is, the two of you—the two companies, which we're pretending for the moment are not made up of many individuals and/or individual clones—can treat it in various different ways, which are up to the two of you; but you need it as a coordination site. You can host it anywhere you like, as long as both of you can reach it for reading, at least one of you can modify it, and if only one can modify it, that one—A or B again—has at least "read" access to a repository that the other publishes.

(Things are simpler, though, if the shared clone is considered "the master" or "the source of all truth", because humans are notoriously bad at determining reality when given multiple different points-of-view. 😅)

For simplicity, I'm mostly going to assume that someone within each of A and B has write (push) access to the shared repo. Let's call this shared repository SR. The rest of this is simply an approach; see Mark Adelsberger's answer for another.

To keep things organized, the branch names in shared repository SR can be rather simply prefixed: instead of having a master, develop, and so on, SR can have branches named A/master and B/master, A/develop and B/develop, and so on. Representatives of company A—either humans operating git push, or a machine-driven update operating by fetch from SR to some exposed point within A—deliver A's master to SR's A/master, and so on. This is pretty easy to do within Git because Git has this notion of branch renaming, especially in the fetch direction.

(If you do use push to update these, consider installing a pre-receive or update hook that verifies that the authenticated push source is allowed to update the name in question. That is, you'd give a different login to representatives of A and B, and then check who is doing the push: is it an A user, or a B user? If it is an A user, all the branch names must begin with refs/heads/A/. That will avoid accidental overwrites.)

(If both A and B are to use tags, you will both need to use some fairly serious self-discipline to make sure you do not stomp on each others tags. It might be wise to forbid tags entirely within SR, never pushing them from either A or B. This is because while Git is happy to rename branch names, any of the various --tags fetch or push operations don't rename tag names, so if someone at A calls something v1.2 and someone at B calls something else v1.2 you wind up with a tag name collision. Using --no-tags can avoid this headache, at the expense of never having any tags on SR at all.)

In this particular setup, this allows each company to have an internal mirror of shared repository SR. The internal mirror tells you, regardless of whether you work at A or B, what they see: if you're at A, you inspect B/master or B/develop to see what their latest is. This internal mirror simply copies whatever's in SR. But it gives you access to the shared data, even though you have no direct access, not even to the shared repository SR.

To send something from A to B, a worker at A proposes the commits, and then someone at A who has the appropriate authority integrates those commits into some internal repository—possibly the one that acts as a mirror, or perhaps yet another repository. Git usage kind of encourages a lot of repository duplication like this, and it actually works very well. Now the person at A who has the authority pushes the commits to SR. If they land there, the person at A updates the A-accessible mirror as well, so that all programmers at A can see that these commits are available to programmers at B. At this point A/branch differs from B/branch on SR. It's now up to the people at B to integrate those into their repository. Once they do that, they will go through this same sort of dance (see below) and SR will have A/branch and B/branch matching again.

When programmers at B make some update, if programmers at A like that update, they can incorporate the new commits into their own repositories, then send them as updates via the same authorized-person technique. Now instead of B/branch being ahead of A/branch, the two are in sync.

like image 35
torek Avatar answered Mar 31 '23 08:03

torek