I currently have a big git repository that contains many projects, each one in its own subdirectory. I need to split it into individual repositories, each project in its own repo.
I tried git filter-branch --prune-empty --subdirectory-filter PROJECT master
However, many project directories went through several renames in their lives, and git filter-branch
does not follow renames, so effectively the extracted repo does not have any history prior to the last rename.
How can I effectively extract a subdirectory from one big git repo, and follow all that directory's renames back into the past?
Merge the files into the new repository B. Step 2: Go to that directory. Step 3: Create a remote connection to repository A as a branch in repository B. Step 4: Pull files and history from this branch (containing only the directory you want to move) into repository B.
Thanks to @Chronial, I was able to cook a script to massage my git repo according to my needs:
git filter-branch --prune-empty --index-filter ' # Delete files which are NOT needed git ls-files -z | egrep -zv "^(NAME1|NAME2|NAME3)" | xargs -0 -r git rm --cached -q # Move files to root directory git ls-files -s | sed -e "s-\t\(NAME1\|NAME2\|NAME3\)/-\t-" | GIT_INDEX_FILE=$GIT_INDEX_FILE.new \ git update-index --index-info && ( test ! -f "$GIT_INDEX_FILE.new" \ || mv -f "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE" ) '
Basically what this does is this:
Deletes all files outside of the three directories NAME1, NAME2 or NAME3 that I need (one project was renamed NAME1 -> NAME2 -> NAME3 during its lifetime).
Moves everything inside these three directories to the root of the repository.
I needed to test if "$GIT_INDEX_FILE.new" exists since import of svn into git creates commits without any files (directory-only commits). Needed only if the repo was created with 'git svn clone' initially.
I had a very large repository from which I needed to extract a single folder; even --index-filter
was predicted to take 8 hours to finish. Here's what I did instead:
old-name
and new-name
.For each name:
$ git checkout master $ git checkout -b filter-old-name $ git filter-branch --subdirectory-filter old-name
This will give you several disconnected branches, each containing history for one of the names.
The filter-old-name
branch should end with the commit which renamed the folder, and the filter-new-name
branch should begin with the same commit. (The same applies if there was more than one rename: you'll wind up with an equivalent number of branches, each with a commit shared with the next one along.) One should delete everything and the other should recreate it again. Make sure that these two commits have identical contents; if they don't, the file was modified in addition to being renamed, and you will need to merge the changes. (In my case I didn't have this problem so I don't know how to solve it.)
An easy way to check this is to try rebasing filter-new-name
on top of filter-old-name
and then squashing the two commits together: git should complain that this produces an empty commit. (Note that you will want to do this on a spare branch and then delete it: rebasing deletes the Committer information from the commits, thus losing some of the history you want to keep.)
The next step is to graft the two branches together, skipping the two commits which renamed the folder. (Otherwise there will be a weird jump where everything is deleted and recreated.) This involves finding the full SHA (all 40 characters!) of the two commits and putting them into git's info, with the new name branch's commit first, and the old name branch's commit second.
$ echo $NEW_NAME_SECOND_COMMIT_SHA1 $OLD_NAME_PENULTIMATE_COMMIT_SHA1 >> .git/info/grafts
If you've done this right, git log --graph
should now show a line from the end of the new history to the start of the old history.
This graft is currently temporary: it is not yet part of the history, and won't follow along with clones or pushes. To make it permanent:
$ git filter-branch
This will refilter the branch without trying to make any further changes, making the graft permanent (changing all of the commits in the filter-new-name
branch). You should now be able to delete the .git/info/grafts
file.
At the end of all of this, you should now have on the filter-new-name
branch all of the history from both names for the folder. You can then use this separate repository, or merge it into another one, or whatever you'd like to do with this history.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With