Let's say you have the repository:
myCode/megaProject/moduleA
myCode/megaProject/moduleB
Over time (months), you re-organise the project. Refactoring the code to make the modules independent. Files in the megaProject directory get moved into their own directories. Emphasis on move - the history of these files is preserved.
myCode/megaProject
myCode/moduleA
myCode/moduleB
Now you wish to move these modules to their own GIT repos. Leaving the original with just megaProject on its own.
myCode/megaProject
newRepoA/moduleA
newRepoB/moduleB
The filter-branch
command is documentated to do this but it doesn't follow history when files were moved outside of the target directory. So the history begins when the files were moved into their new directory, not the history the files had then they lived in the old megaProject directory.
How to split a GIT history based on a target directory, and, follow history outside of this path - leaving only commit history related to these files and nothing else?
The numerous other answers on SO focus on generally splitting apart the repo - but make no mention of splitting apart and following the move history.
So far, we know that git doesn't lose the history of renamed files, but by default doesn't show it.
Merge the files into the new repository B. Step 2: Go to that directory. Step 3: Create a remote connection to repository A as a branch in repository B. Step 4: Pull files and history from this branch (containing only the directory you want to move) into repository B.
This is a version based on @rksawyer's scripts, but it uses git-filter-repo instead. I found it was much easier to use and much much faster than git-filter-branch (and is now recommended by git as a replacement).
# This script should run in the same folder as the project folder is.
# This script uses git-filter-repo (https://github.com/newren/git-filter-repo).
# The list of files and folders that you want to keep should be named <your_repo_folder_name>_KEEP.txt. I should contain a line end in the last line, otherwise the last file/folder will be skipped.
# The result will be the folder called <your_repo_folder_name>_REWRITE_CLONE. Your original repo won't be changed.
# Tags are not preserved, see line below to preserve tags.
# Running subsequent times will backup the last run in <your_repo_folder_name>_REWRITE_CLONE_BKP.
# Define here the name of the folder containing the repo:
GIT_REPO="git-test-orig"
clone="$GIT_REPO"_REWRITE_CLONE
temp=/tmp/git_rewrite_temp
rm -Rf "$clone"_BKP
mv "$clone" "$clone"_BKP
rm -Rf "$temp"
mkdir "$temp"
git clone "$GIT_REPO" "$clone"
cd "$clone"
git remote remove origin
open .
open "$temp"
# Comment line below to preserve tags
git tag | xargs git tag -d
echo 'Start logging file history...'
echo "# git log results:\n" > "$temp"/log.txt
while read p
do
shopt -s dotglob
find "$p" -type f > "$temp"/temp
while read f
do
echo "## " "$f" >> "$temp"/log.txt
# print every file and follow to get any previous renames
# Then remove blank lines. Then remove every other line to end up with the list of filenames
git log --pretty=format:'%H' --name-only --follow -- "$f" | awk 'NF > 0' | awk 'NR%2==0' | tee -a "$temp"/log.txt
echo "\n\n" >> "$temp"/log.txt
done < "$temp"/temp
done < ../"$GIT_REPO"_KEEP.txt > "$temp"/PRESERVE
mv "$temp"/PRESERVE "$temp"/PRESERVE_full
awk '!a[$0]++' "$temp"/PRESERVE_full > "$temp"/PRESERVE
sort -o "$temp"/PRESERVE "$temp"/PRESERVE
echo 'Starting filter-branch --------------------------'
git filter-repo --paths-from-file "$temp"/PRESERVE --force --replace-refs delete-no-add
echo 'Finished filter-branch --------------------------'
It logs the result of git log
into a file in /tmp/git_rewrite_temp/log.txt
, so you can get rid of these lines if you don't need a log.txt and want it to run faster.
Running git filter-branch --subdirectory-filter
in your cloned repository will remove all commits that don't affect content in that subdirectory, which includes those affecting the files before they were moved.
Instead, you need to use the --index-filter
flag with a script to delete all files you're not interested in, and the --prune-empty
flag to ignore any commits affecting other content.
There's a blog post from Kevin Deldycke with a good example of this:
git filter-branch --prune-empty --tree-filter 'find ./ -maxdepth 1 -not -path "./e107*" -and -not -path "./wordpress-e107*" -and -not -path "./.git" -and -not -path "./" -print -exec rm -rf "{}" \;' -- --all
This command effectively checks out each commit in turn, deletes all uninteresting files from the working directory and, if anything has changed from the last commit then it checks it in (rewriting the history as it goes). You would need to tweak that command to delete all files except those in, say, /moduleA
, /megaProject/moduleA
and the specific files you want to keep from /megaProject
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With