I've been lately working on a API library which wraps parts of a relatively large external API to a more idiomatic structure. As I did my API exploration while writing the prototype code, I ended up implementing three of the available sub-APIs with varying degrees of functionality. Or to put it in simpler terms, I have a project which structurally looks like
dir:root
└ dir:feature-a
└ dir:feature-b
└ dir:feature-c
└ dir:common
└ file:build.gradle
└ file:build.py
where each feature matches with one of the sub-APIs. Worth mentioning is that the directories aren't flat, I just omitted subdirectories for simplicity's sake.
My main problem is that while I actually did for once provide a semi decent version history, it's all in one branch and only one of the sub-APIs is ready to be released. Ideally, I'd like to find the most convenient way to
I have previously used git filter-branch
for a similar purpose but the one major curve ball here is that the repository root is actually another repository - on meta level the repository has two parents which admittedly is funky and very useful for keeping the build scripts up-to-date but if I tried to do what I want with filter-branch
the build scripts at the root of the project would get removed which definitely is not what I want.
Finally the common
directory is a bit special one - I don't mind cutting its version history, as long as its contents are there.
Specify that the Git repository is to be shared amongst several users. This allows users belonging to the same group to push into that repository. When specified, the config variable "core. sharedRepository" is set so that files and directories under $GIT_DIR are created with the requested permissions.
Summary
If you want to retain the history of some common resources (build.*
) and keep those resources easily mergeable in the future, and you want to rewrite/filter/remove a sub-set of other trees in the repository (feature-a
, common
) using git filter-branch
, you should first re-write your existing commits in the order:
build.*
, including local changes and merges from your upstream Cradle.feature-*
and common
.You can then run git filter-branch
safely on the project-specific development-line, without rewriting any of the upstream resource history. If you don't do this, you will probably end-up re-writing commits involving the build-scripts, including merge-commits from upstream Cradle, which will inhibit history traceability and future merges.
Detail
It sounds like you have a golden-project-template, call it T
, and each time you start a new project, you fork that repo (either in the traditional GitHub sense, or just create what will be a divergent clone) call it Pn
. So Pn
and T
start with the same history and common commits (call the branch point Pn-0
).
As Pn
develops its code-base, other projects might identify improvements to the base project-template infrastructure, and make a change to file F
in T
. Any project Pn
, which might be hundreds of commits ahead of the template, can still merge-up the changes in common files from T
.
Now, you want to rewrite-history in Pn
. Since Pn-0
you have made many project-specific commits, then a merge from T
, then more project-specific commits. If you had to rewrite P
back to Pn-0
in order to filter-branch
, the merge-history from T
is lost, since the histories have diverged, and future merges from T
become hellish.
Does that describe your problem?
I think you are seeing that using a project-clone-from-template approach has its limitations when you want to have full freedom of history-rewriting to re-organise your project repo. Provided you have history both before and after merge commits from T
, you are going to have to do some fancy re-organisation in order to retain a common history. That solution is:
Tx
be the most recent commit of T
which you have performed a full merge of into Pn
.T
into the Pn
repo, and create a branch in Pn
that starts with commit Tx
.Pn
history onto that branch, moving it from a base of Pn-0
(common commit with T
) to Tx
, the latest common commit with T
.This approach will replay your entire history in Pn
as if it started with Tx
instead of Pn-0
, so commit Pn-1
has a new parent Tx
. Of course each commit will be re-written, so any existing clones of Pn
are immediately orphaned.
Once you have this, you are free to run git filter-branch
starting with the re-written commit Pn-1
, and remove any history of incomplete modules.
Now - this is a fair amount of trouble to go to, and rewrites history in tricky ways, but the history will be retained. You wouldn't want to be doing this process every day.
One thing you might want to consider is whether there's any way you can produce and consume your Cradle without source-sharing. It might not be as convenient as Git-merging, but if your template project is version-controlled and you organize your build logic and maybe use shared scripts, you can modularize your template project so you no longer depend on child-projects maintaining common source histories in order to merge-up - they would just consume the latest template binaries instead. Depends a lot of course on what's in the template other than build logic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With