Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Restructuring Git repo with prototype code to actual publishable project

Tags:

git

I've been lately working on a API library which wraps parts of a relatively large external API to a more idiomatic structure. As I did my API exploration while writing the prototype code, I ended up implementing three of the available sub-APIs with varying degrees of functionality. Or to put it in simpler terms, I have a project which structurally looks like

dir:root
 └ dir:feature-a
 └ dir:feature-b
 └ dir:feature-c
 └ dir:common
 └ file:build.gradle
 └ file:build.py

where each feature matches with one of the sub-APIs. Worth mentioning is that the directories aren't flat, I just omitted subdirectories for simplicity's sake.

My main problem is that while I actually did for once provide a semi decent version history, it's all in one branch and only one of the sub-APIs is ready to be released. Ideally, I'd like to find the most convenient way to

  1. Split the existing repo so that I can turn each feature into its own branch so that I can publish them one by one as they mature enough
  2. Keep the current version history (with some rebasing, possibly)

I have previously used git filter-branch for a similar purpose but the one major curve ball here is that the repository root is actually another repository - on meta level the repository has two parents which admittedly is funky and very useful for keeping the build scripts up-to-date but if I tried to do what I want with filter-branch the build scripts at the root of the project would get removed which definitely is not what I want.

Finally the common directory is a bit special one - I don't mind cutting its version history, as long as its contents are there.

like image 383
Esko Avatar asked May 31 '15 09:05

Esko


People also ask

How do I initialize a Git repository in shared mode?

Specify that the Git repository is to be shared amongst several users. This allows users belonging to the same group to push into that repository. When specified, the config variable "core. sharedRepository" is set so that files and directories under $GIT_DIR are created with the requested permissions.


1 Answers

Summary

If you want to retain the history of some common resources (build.*) and keep those resources easily mergeable in the future, and you want to rewrite/filter/remove a sub-set of other trees in the repository (feature-a, common) using git filter-branch, you should first re-write your existing commits in the order:

  • All the common commits from before the project was forked from the template (this will already be the case).
  • All commits modifying build.*, including local changes and merges from your upstream Cradle.
  • Finally, all project-specific commits for feature-* and common.

You can then run git filter-branch safely on the project-specific development-line, without rewriting any of the upstream resource history. If you don't do this, you will probably end-up re-writing commits involving the build-scripts, including merge-commits from upstream Cradle, which will inhibit history traceability and future merges.

Detail

It sounds like you have a golden-project-template, call it T, and each time you start a new project, you fork that repo (either in the traditional GitHub sense, or just create what will be a divergent clone) call it Pn. So Pn and T start with the same history and common commits (call the branch point Pn-0).

As Pn develops its code-base, other projects might identify improvements to the base project-template infrastructure, and make a change to file F in T. Any project Pn, which might be hundreds of commits ahead of the template, can still merge-up the changes in common files from T.

Now, you want to rewrite-history in Pn. Since Pn-0 you have made many project-specific commits, then a merge from T, then more project-specific commits. If you had to rewrite P back to Pn-0 in order to filter-branch, the merge-history from T is lost, since the histories have diverged, and future merges from T become hellish.

Does that describe your problem?

I think you are seeing that using a project-clone-from-template approach has its limitations when you want to have full freedom of history-rewriting to re-organise your project repo. Provided you have history both before and after merge commits from T, you are going to have to do some fancy re-organisation in order to retain a common history. That solution is:

  • Let Tx be the most recent commit of T which you have performed a full merge of into Pn.
  • Fetch T into the Pn repo, and create a branch in Pn that starts with commit Tx.
  • Rebase your current Pn history onto that branch, moving it from a base of Pn-0 (common commit with T) to Tx, the latest common commit with T.

This approach will replay your entire history in Pn as if it started with Tx instead of Pn-0, so commit Pn-1 has a new parent Tx. Of course each commit will be re-written, so any existing clones of Pn are immediately orphaned.

Once you have this, you are free to run git filter-branch starting with the re-written commit Pn-1, and remove any history of incomplete modules.

Now - this is a fair amount of trouble to go to, and rewrites history in tricky ways, but the history will be retained. You wouldn't want to be doing this process every day.

One thing you might want to consider is whether there's any way you can produce and consume your Cradle without source-sharing. It might not be as convenient as Git-merging, but if your template project is version-controlled and you organize your build logic and maybe use shared scripts, you can modularize your template project so you no longer depend on child-projects maintaining common source histories in order to merge-up - they would just consume the latest template binaries instead. Depends a lot of course on what's in the template other than build logic.

like image 53
javabrett Avatar answered Oct 11 '22 05:10

javabrett