I know there are much questions about the same thing but i still need more information. I am investigating possibility of migrating our SVN repo to git and trying to understand what approach (monolith trunk, submodules, subtrees, etc) will be the best for our repo.
Here is some information about our project and SVN repository:
Basically our structure looks like:
repo
|-application(war)
|-module1 (for example, ui stuff)
|--module1Submodule1
|--module1Submodule2
|-module2 (for example, database access stuff)
|-...
Each module has it's own tags and branches.
The size of svn repo on my local machine with all branches, tags, etc is:
Typical usecases:
Future usecases:
The questions are:
Thank you in advance!
History
History can be preserved for all mentioned approaches by using git svn: http://git-scm.com/book/en/Git-and-Other-Systems-Migrating-to-Git Even switching back to previous commits is possible.
However, there were suggestions to not preserve history and just leave svn repository freezed for about 6 months, while all history will change in a git repo. I disagree with such advices because history is essential for our project. I bet no one accept such solution.
Giant trunk approach
Concern: Having 200+ Dev and QA in whole team, I suspect it will be quite uneasy to eventually push the changes.
Quiz
What are the steps, their cost and total cost of migration using this approach?
How can it support code gating? What changes are required from VCS / tools perspective? Suppose here that full CI run takes 15 minutes.
What are efficient developer workflows?
Submodules
Most caveats explained here http://git-scm.com/book/en/Git-Tools-Submodules and here http://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt-use-git-submodules/
The main issue is that you will have to commit twice
Actually submodules created for cases when there is a library which can be reused with different projects, but you want to depend on particular tag of the library with ability to update reference in future. However we are not going to tag each commit (only release after each commit) and changing dependencies versions (to released ones) in war will be easier than maintaining submodules approach. Java dependency management make things simpler.
It is not recommended to point to submodule head and leads to troubles with submodules, so this approach is dead end for going to snapshots. And again we don’t need it because java dependency management will do everything for us.
Quiz What are the steps, their cost and total cost of migration using this approach?
How can it support code gating? What changes are required from VCS / tools perspective? Suppose here that full CI run takes 15 minutes.
What are efficient developer workflows? (Gerrit process is ommited)
Or
As you see developer workflow is cumbersome (requires to always update two places) and doesn’t suit our needs.
Subtrees
The main issue is that you will have to commit twice To tree merged subdirectory Push changes to original repo
Subtrees is a better alternative to submodules, it’s more robust and merges source code of submodules to aggregating repo instead of just referencing it. It makes things simpler to maintain such aggregating repo, however the problem with subtrees is the same as for submodules, making double commits is totally useless. You are not forced to commit changes to original module repo, and can commit it with aggregating repo, it can lead to inconsistense between repos...
The differences are explained quite well here: http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/
Quiz What are the steps, their cost and total cost of migration using this approach?
How can it support code gating? What changes are required from VCS / tools perspective? Suppose here that full CI run takes 15 minutes.
What are efficient developer workflows? (Gerrit process is ommited)
Again like with submodules there is no sense in having two places (repoes) where code/changes are present. Not for our case.
Separate repos
Separate repos looks like a best solution and follow original git intension. Granularity of repoes can vary. The most fine-grained case is to have repo per maven release group, however it can lead to too many repos. Also we need to consider how often one particular svn commit affects several modules or release groups. If we see, that commit usually affects 3-4 release groups then this groups should form a repo.
Also i believe it’s worth to at least separate api modules from implementation modules.
Quiz What are the steps, their cost and total cost of migration using this approach?
How can it support code gating? What changes are required from VCS / tools perspective? Suppose here that full CI run takes 15 minutes.
What are efficient developer workflows?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With