Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I fix a git subtree after the upstream project force pushed onto master?

Tags:

git

subtree

I've been experimenting with using git subtree and have run into the following situation.

I used git subtree to add an external project to my repo, I intentionally kept all of the history for the upstream project as I want to be able to refer to the project's history and also contribute back to the upstream project later.

As it turns out, another contributor to the upstream project accidentally pushed a large file into the master branch. To fix this, the upstream project rewrote history and force pushed onto master. When creating my "monorepo", I included this commit and I would also like to remove it.

How can I update my repository to reflect the new history of the subtree?

My first attempt was to use filter-branch to completely remove the subtree and all history.

git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch upstream-project-dir' --prune-empty HEAD

Once the old version of the subtree was removed, I could re-add the subtree using the new upstream master. However, this didn't work because for some reason the commit history still shows up in the git log output.

Update

I've wrote up the steps to create a minimally reproducible example.

  1. First create an empty git repo.

    git init test-monorepo
    cd ./test-monorepo
    
  2. Create an initial commit.

    echo hello world > README
    git add README
    git commit -m 'initial commit'
    
  3. Now add a subtree for an external project.

    git remote add thirdparty [email protected]:teivah/algodeck.git
    git fetch thirdparty
    git subtree add --prefix algodeck thirdparty master
    
  4. Make some commits on the monorepo

    echo dont panic >> algodeck/README.md
    git commit -a -m 'test commit'
    
  5. Now attempt to use git filter-branch to remove the subtree.

    git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch algodeck' --prune-empty HEAD
    
  6. Examine git log output, I am expecting to see only my initial commit.

    git log
    
like image 611
csnate Avatar asked Nov 07 '19 14:11

csnate


People also ask

What does git subtree push do?

git subtree split lets you specify a rev other than HEAD. ' git push '(man) lets you specify a mapping between a local thing and a remot ref. So smash those together, and have git subtree push let you specify which local thing to run split on and push the result of that split to the remote ref.

How does git subtree work?

Adding a subtree Specify the prefix local directory into which you want to pull the subtree. Specify the remote repository URL [of the subtree being pulled in] Specify the remote branch [of the subtree being pulled in] Specify you want to squash all the remote repository's [the subtree's] logs.

What is git subtree split?

Subtree split First you split a new branch from your history containing only the subtree rooted at <prefix>. The new history includes only the commits (including merges) that affected <prefix>. The commit in which where previously rooted in the subdirectory <prefix> are now at the root of the project.


1 Answers

  1. on your repo, cleanup the history of commits for this remote :

    git fetch upstream
    
  2. if one of your own commits has a commit that includes the large file, rewrite your history so that this large file is no longer referenced

    # using one or more of the following commands :
    git rebase --interactive
    git filter-branch
    ...
    

With these two steps, the big file will not be referenced anymore by any commit in your repo.
It will additionally be deleted from your hard drive at some point in time, when git runs its garbage collector and the expiration delays for dangling blobs has been reached.


If you have an urgent need to delete this big file ASAP from your hard drive :

Manually run

git gc --prune=now
like image 160
LeGEC Avatar answered Oct 21 '22 05:10

LeGEC