I've been experimenting with using git subtree and have run into the following situation.
I used git subtree to add an external project to my repo, I intentionally kept all of the history for the upstream project as I want to be able to refer to the project's history and also contribute back to the upstream project later.
As it turns out, another contributor to the upstream project accidentally pushed a large file into the master branch. To fix this, the upstream project rewrote history and force pushed onto master. When creating my "monorepo", I included this commit and I would also like to remove it.
How can I update my repository to reflect the new history of the subtree?
My first attempt was to use filter-branch to completely remove the subtree and all history.
git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch upstream-project-dir' --prune-empty HEAD
Once the old version of the subtree was removed, I could re-add the subtree using the new upstream master. However, this didn't work because for some reason the commit history still shows up in the git log output.
Update
I've wrote up the steps to create a minimally reproducible example.
First create an empty git repo.
git init test-monorepo
cd ./test-monorepo
Create an initial commit.
echo hello world > README
git add README
git commit -m 'initial commit'
Now add a subtree for an external project.
git remote add thirdparty [email protected]:teivah/algodeck.git
git fetch thirdparty
git subtree add --prefix algodeck thirdparty master
Make some commits on the monorepo
echo dont panic >> algodeck/README.md
git commit -a -m 'test commit'
Now attempt to use git filter-branch to remove the subtree.
git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch algodeck' --prune-empty HEAD
Examine git log output, I am expecting to see only my initial commit.
git log
git subtree split lets you specify a rev other than HEAD. ' git push '(man) lets you specify a mapping between a local thing and a remot ref. So smash those together, and have git subtree push let you specify which local thing to run split on and push the result of that split to the remote ref.
Adding a subtree Specify the prefix local directory into which you want to pull the subtree. Specify the remote repository URL [of the subtree being pulled in] Specify the remote branch [of the subtree being pulled in] Specify you want to squash all the remote repository's [the subtree's] logs.
Subtree split First you split a new branch from your history containing only the subtree rooted at <prefix>. The new history includes only the commits (including merges) that affected <prefix>. The commit in which where previously rooted in the subdirectory <prefix> are now at the root of the project.
on your repo, cleanup the history of commits for this remote :
git fetch upstream
if one of your own commits has a commit that includes the large file, rewrite your history so that this large file is no longer referenced
# using one or more of the following commands :
git rebase --interactive
git filter-branch
...
With these two steps, the big file will not be referenced anymore by any commit in your repo.
It will additionally be deleted from your hard drive at some point in time, when git runs its garbage collector and the expiration delays for dangling blobs has been reached.
If you have an urgent need to delete this big file ASAP from your hard drive :
Manually run
git gc --prune=now
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With