Restructuring Git repo with prototype code to actual publishable project

Tags:

git

I've been lately working on a API library which wraps parts of a relatively large external API to a more idiomatic structure. As I did my API exploration while writing the prototype code, I ended up implementing three of the available sub-APIs with varying degrees of functionality. Or to put it in simpler terms, I have a project which structurally looks like

dir:root
 └ dir:feature-a
 └ dir:feature-b
 └ dir:feature-c
 └ dir:common
 └ file:build.gradle
 └ file:build.py

where each feature matches with one of the sub-APIs. Worth mentioning is that the directories aren't flat, I just omitted subdirectories for simplicity's sake.

My main problem is that while I actually did for once provide a semi decent version history, it's all in one branch and only one of the sub-APIs is ready to be released. Ideally, I'd like to find the most convenient way to

Split the existing repo so that I can turn each feature into its own branch so that I can publish them one by one as they mature enough
Keep the current version history (with some rebasing, possibly)

I have previously used git filter-branch for a similar purpose but the one major curve ball here is that the repository root is actually another repository - on meta level the repository has two parents which admittedly is funky and very useful for keeping the build scripts up-to-date but if I tried to do what I want with filter-branch the build scripts at the root of the project would get removed which definitely is not what I want.

Finally the common directory is a bit special one - I don't mind cutting its version history, as long as its contents are there.

383

asked May 31 '15 09:05

Esko

1 Answers

Summary

If you want to retain the history of some common resources (build.*) and keep those resources easily mergeable in the future, and you want to rewrite/filter/remove a sub-set of other trees in the repository (feature-a, common) using git filter-branch, you should first re-write your existing commits in the order:

All the common commits from before the project was forked from the template (this will already be the case).
All commits modifying build.*, including local changes and merges from your upstream Cradle.
Finally, all project-specific commits for feature-* and common.

You can then run git filter-branch safely on the project-specific development-line, without rewriting any of the upstream resource history. If you don't do this, you will probably end-up re-writing commits involving the build-scripts, including merge-commits from upstream Cradle, which will inhibit history traceability and future merges.

Detail

It sounds like you have a golden-project-template, call it T, and each time you start a new project, you fork that repo (either in the traditional GitHub sense, or just create what will be a divergent clone) call it Pn. So Pn and T start with the same history and common commits (call the branch point Pn-0).

As Pn develops its code-base, other projects might identify improvements to the base project-template infrastructure, and make a change to file F in T. Any project Pn, which might be hundreds of commits ahead of the template, can still merge-up the changes in common files from T.

Now, you want to rewrite-history in Pn. Since Pn-0 you have made many project-specific commits, then a merge from T, then more project-specific commits. If you had to rewrite P back to Pn-0 in order to filter-branch, the merge-history from T is lost, since the histories have diverged, and future merges from T become hellish.

Does that describe your problem?

I think you are seeing that using a project-clone-from-template approach has its limitations when you want to have full freedom of history-rewriting to re-organise your project repo. Provided you have history both before and after merge commits from T, you are going to have to do some fancy re-organisation in order to retain a common history. That solution is:

Let Tx be the most recent commit of T which you have performed a full merge of into Pn.
Fetch T into the Pn repo, and create a branch in Pn that starts with commit Tx.
Rebase your current Pn history onto that branch, moving it from a base of Pn-0 (common commit with T) to Tx, the latest common commit with T.

This approach will replay your entire history in Pn as if it started with Tx instead of Pn-0, so commit Pn-1 has a new parent Tx. Of course each commit will be re-written, so any existing clones of Pn are immediately orphaned.

Once you have this, you are free to run git filter-branch starting with the re-written commit Pn-1, and remove any history of incomplete modules.

Now - this is a fair amount of trouble to go to, and rewrites history in tricky ways, but the history will be retained. You wouldn't want to be doing this process every day.

One thing you might want to consider is whether there's any way you can produce and consume your Cradle without source-sharing. It might not be as convenient as Git-merging, but if your template project is version-controlled and you organize your build logic and maybe use shared scripts, you can modularize your template project so you no longer depend on child-projects maintaining common source histories in order to merge-up - they would just consume the latest template binaries instead. Depends a lot of course on what's in the template other than build logic.

answered Oct 11 '22 05:10

javabrett

Related questions
                            
                                Why doesn't git blame --ignore-rev/--ignore-revs-file work for me?
                            
                                How to "split" files with git
                            
                                Codaset, Codebasehq, Unfuddle, Trac or Redmine? [closed]
                            
                                Using Git as a source control for webdevelopment and multiple environment
                            
                                How do I adapt my svn:externals strategy to git submodules?
                            
                                Push Git branches directly into a live server directory so files are seen live
                            
                                git rebase problems
                            
                                gitattributes not setting merge driver correctly
                            
                                Database Schema Migration on Azure with Git Deployment
                            
                                catching a git post-receive error in a script
                            
                                How can I diff two branches with fugitive?
                            
                                Are Git repository names case sensitive?
                            
                                git doesn't work behind proxy since version 1.7.9
                            
                                Capistrano 3 copy strategy equivalent
                            
                                Update git submodules shallowly with the '--depth' option
                            
                                Recover last interactive rebase TODO after abort?
                            
                                how do i use git namespaces locally?
                            
                                Git submodule foreach with selected or limiting submodule list
                            
                                Sublime Text, open files depending on the git branch
                            
                                Git - fatal: Branch name doesn't conform to GIT standards

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With