Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I make a hierarchy of repositories with Git ?

Tags:

git

github

I have a project with the following hierarchy :

Tharwa
|_tharwa-backend
|_tharwa-web
|_tharwa-mobile 

Each is subfolder is a repository in its own; I want to create the repository Tharwa that puts everything together.

I have the following constraints however :

  • I don't want to put them in the same repo as subfolders because each have its own dependencies and configuration files, and also I don't want to have their commits mixed up.

  • I don't want to leave them as separate repos since I have issues on the parent repo that might require work to be done on ,say, both the back end and mobile repos,and i would like the have the issue solved on the same branch, for example :

    __________________________ master
     \________________________ develop
           \______/ login
    

My question is, how can I make something like this possible ? where do I have it wrong?

and please let me know if I didn't explain myself well. Thank you in advance

like image 707
Amine Birouk Avatar asked Jan 29 '23 14:01

Amine Birouk


2 Answers

There are basically 3 ways to see it:

  1. one repository with 3 directories, one per project
  2. one repository, probably almost empty, with 3 git submodules (so each one a repository by its own, but tied to the main one)
  3. three completely separate repositories

I do not know from where your constraint "I don't want to have their commits mixed up." comes from, maybe just because you do not know git too much yet. Just note for now that you have powerfool tools and options in git to view commits, filter by date, author, path, content, etc. So in my opinion this is nothing to worry about. And on the contrary this allows you to clearly show, with an unique commit, that file X in first project has changed at the same time as file Y in second project (for example because you changed an API so you need to change at the same time the producer and the consumer of the API, and that should be reflected in only one commit).

But if you want strict commit isolation, you have it in option 3, and also in option 2. Not in option 1: there, one commit could cover changes in any of the subprojects.

As for your second constraint, it is immediately possible in option 1, kind of possible in 2, but certainly not in 3.

git submodules deserve a discussion by themselves as they come with their own constraints. Make sure to read and learn about them before using them in a large scale. Here are some interesting links for them, besides the official documentation (first link)

  • https://git-scm.com/book/en/v2/Git-Tools-Submodules
  • https://github.com/blog/2104-working-with-submodules (see Advice on using submodules or not, at end)
  • https://medium.com/@porteneuve/mastering-git-submodules-34c65e940407 (discussion on submodules vs subtrees, basically option 2 vs 1)
  • https://www.atlassian.com/blog/git/git-submodules-workflows-tips (some tips on workflows with submodules)
  • https://www.atlassian.com/blog/git/git-submodules-workflows-tips (submodules and branches)
  • https://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt-use-git-submodules/ (some arguments against them)

As for your specific question on submodules and branches, have a look at this question and its answers: Git submodules: Specify a branch/tag

Like I wrote in my comment, things do also depend on how you package and distribute this software. Is it always one piece of code deployed as is (option 1 and 2 would make more sense), or can you release only one project separately from the others (option 3 would make more sense). Note that I said "make more sense", because it is not black and white, you can always achieve your goals in any option, the compromises are just different.

It depends also on the fleet of developers that will work on these. What are their knowledge level with git? submodules is not something I would recommend to git beginners. And how commits are pushed/pulled between remote repositories? In option 1 you have one repository to cater for, in option 2 also but you need to updates the submodules (see documentation) and in option 3 you have 3 separate repositories to handle.

There may also be other side points to take into accounts, but they may be irrelevant if you start with empty content. Like sizes. Some repositories can include sometimes a lot of history, and this impacts git clone for example (so in this case having separate repositories if one is big, this does not impact the others).

You seem to hint at a workflow as described on http://nvie.com/posts/a-successful-git-branching-model/ which is a good start. If you want to stick to it, it will be easy in option 1, mostly possible but not exactmy in 2, and not possible in 3. (and have a look at https://www.atlassian.com/git/tutorials/comparing-workflows for some other possible workflows)

It really seems to me your 2 constraints are going in opposite directions, so you would need to see which one is more important than the other.

As for myself, but without the whole picture you have, I would favor option 1 as it seems the most flexible one (and from it you can easily switch to option 2 or 3 later).

like image 89
Patrick Mevzek Avatar answered Jan 31 '23 20:01

Patrick Mevzek


I don't want to put them in the same repo as subfolders because each have its own dependencies and configuration files, and also I don't want to have their commits mixed up.

Ok, you don't want their commits mixed up.

I don't want to leave them as separate repos since I have issues on the parent repo that might require work to be done on ,say, both the back end and mobile repos,and i would like the have the issue solved on the same branch...

...except when you DO want their commits mixed up. ;)

Where do I have it wrong?

If you have to habitually change multiple repositories simultaneously, you may want to consider whether they're actually a single repository. There's two good ways to handle this, subrepositories are not one of them.


One repo

One is to make them a single repository. If they're all pieces of the same project, and they have changes which depend on each other, they're a single repository. It's ok for them to be subfolders with their own configuration and dependencies, this is fairly common for large projects that need to be developed together, but split for distribution.

The downside is developers are likely to take advantage of this and tightly bind the client code to the backend. Without clear separations between the projects the backend API is likely to get sloppy. The clients are more likely to take advantage of undocumented backend features making the whole system brittle and resistant to change. Adding a new client, like maybe tharwa-api, will become more difficult.

If you have 3rd parties writing their own clients for the tharwa-backend, they're at a disadvantage. client and web are in a privileged position, they can be in lock-step with backend. 3rd party developers aren't so lucky, and your project will be harder to contribute to.

And once you wield your projects together, you're not likely to ever pull them apart again.


Many repos, strict dependencies.

The other is to more strictly enforce your encapsulation between the pieces by each repo treating the other as normal dependencies. In your login example...

  • Implement, test, and commit the change on backend.
  • Release backend, even if only for internal distribution.
  • Test web and mobile against the new backend to ensure backwards compatibility is maintained.
  • Some dependency mechanisms allow drawing dependencies directly from a Git repo.
  • Have web and mobile update their backend dependency and use the new feature.

Now it harder for developers to cheat. The extra step of a release (which shouldn't take more than a minute or two) provides an "air gap". backend has to develop its own unit, integration, and acceptance tests; it can't rely on the clients to do it for them. The clients have to be more robust and adhere more strictly to the backend API. With the backend and client decoupled, it will be easier to make radical changes to the internals of each.

Developers can still make lockstep changes, but they're now explicit. Making them explicit discourages their use, it prevents devs from getting lazy.

But it does add some more overhead. backend changes must be fully thought through, developed, and documented. The backend API must be more fully developed and robust. The clients must adhere more closely to the API. All this is good software engineering and will speed things up in the mid and long-term.


Why not submodules?

Submodules provide most of the upsides of a single repo, but adding a confusing feature. It also provides all of the downsides of a single repo, plus one more: a lack of coordination.

With a single repo, one commit is one commit. One branch is one branch. With submodules is it's difficult to know by looking at a single repository which commits must be coordinated between all repositories. These coordinated commits can happen at any time, without warning, and it's difficult to know.

You'll want some procedures and mechanisms to track and coordinate these commits. You could build this all yourself through trial and error, maybe something with tags or special commit messages.

Or you could use an existing release dependency system.


Which you choose depends on your project. However I'd recommend you try the full decoupling and see how it goes. It encourages good software engineering practices. And you can always put them back together later, it's difficult to go the other way around.

like image 31
Schwern Avatar answered Jan 31 '23 18:01

Schwern