What I want is similar to this question. However, I want the directory that is split into a separate repo to remain a subdirectory in that repo:
I have this:
foo/
.git/
bar/
baz/
qux/
And I want to split it into two completely independent repositories:
foo/
.git/
bar/
baz/
quux/
.git/
qux/ # Note: still a subdirectory
How to do this in git?
I could use the method from this answer if there is some way to move all the new repo's contents into a subdirectory, throughout history.
You could indeed use the subdirectory filter followed by an index filter to put the contents back into a subdirectory, but why bother, when you could just use the index filter by itself?
Here's an example from the man page:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
This just removes one filename; what you want to do is remove everything but a given subdirectory. If you want to be cautious, you could explicitly list each path to remove, but if you want to just go all-in, you can just do something like this:
git filter-branch --index-filter 'git ls-tree -z --name-only --full-tree $GIT_COMMIT | grep -zv "^directory-to-keep$" | xargs -0 git rm --cached -r' -- --all
I expect there's probably a more elegant way; if anyone has something please suggest it!
A few notes on that command:
--full-tree
to be necessary, but apparently filter-branch runs the index-filter from the .git-rewrite/t
directory instead of the top level of the repo.--all
applies this to all refs; I figure you really do want that. (the --
separates it from the filter-branch options)-z
and -0
tell ls-tree, grep, and xargs to use NUL termination to handle spaces in filenames.Edit, much later: Thomas helpfully suggested a way to remove the now-empty commits, but it's now out of date. Look at the edit history if you've got an old version of git, but with modern git, all you need to do is tack on this option:
--prune-empty
That'll remove all commits which are empty after the application of the index filter.
I wanted to do a similar thing, but since the list of files that i wanted to keep was pretty long, it didn't make sense to do this using countless greps. I wrote a script that reads the list of files from a file:
#!/bin/bash
# usage:
# git filter-branch --prune-empty --index-filter \
# 'this-script file-with-list-of-files-to-be-kept' -- --all
if [ -z $1 ]; then
echo "Too few arguments."
echo "Please specify an absolute path to the file"
echo "which contains the list of files that should"
echo "remain in the repository after filtering."
exit 1
fi
# save a list of files present in the commit
# which is currently being modified.
git ls-tree -r --name-only --full-tree $GIT_COMMIT > files.txt
# delete all files that shouldn't be removed
while read string; do
grep -v "$string" files.txt > files.txt.temp
mv -f files.txt.temp files.txt
done < $1
# remove unwanted files (i.e. everything that remained in the list).
# warning: 'git rm' will exit with non-zero status if it gets
# an invalid (non-existent) filename OR if it gets no arguments.
# If something exits with non-zero status, filter-branch will abort.
# That's why we have to check carefully what is passed to git rm.
if [ "$(cat files.txt)" != "" ]; then
cat files.txt | \
# enclose filenames in "" in case they contain spaces
sed -e 's/^/"/g' -e 's/$/"/g' | \
xargs git rm --cached --quiet
fi
Quite suprisingly, this turned out to be much more work than i initially expected, so i decided to post it here.
This is what I ended up doing to solve this issue when I had it myself:
git filter-branch --index-filter \
'git ls-tree --name-only --full-tree $GIT_COMMIT | \
grep -v "^directory-to-keep$" | \
sed -e "s/^/\"/g" -e "s/$/\"/g" | \
xargs git rm --cached -r -f --ignore-unmatch \
' \
--prune-empty -- --all
The solution is based on Jefromi’s answer and on Detach (move) subdirectory into separate Git repository plus many comments here on SO.
The reason why Jefromi’s solution did not work for me was, that I had files and folders in my repo whose names contained special characters (mostly spaces). Additionally git rm
complained about unmatched files (resolved with --ignore-unmatch
).
You can keep the filtering agnostic to the directory not being in the repo’s root or being moved around:
grep --invert-match "^.*directory-to-keep$"
And finally, you can use this to filter out a fixed subset of files or directories:
egrep --invert-match "^(.*file-or-directory-to-keep-1$|.*file-or-directory-to-keep-2$|…)"
To clean up afterwards you can use these commands:
$ git reset --hard
$ git show-ref refs/original/* --hash | xargs -n 1 git update-ref -d
$ git reflog expire --expire=now --all
$ git gc --aggressive --prune=now
Use git-filter-repo This is not part of git as of version 2.25. This requires Python3 (>=3.5) and git 2.22.0
mkdir new_repoA
mkdir new_repoB
git clone originalRepo newRepoA
git clone originalRepo newRepoB
pushd
cd new_repoA
git filter-repo --path foo/bar --path foo/baz
popd
cd new_repoB
git filter-repo --path foo/qux
For my repo that contained ~12000 commits git-filter-branch took more than 24 hours and git-filter-repo took less than a minute.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With