Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GIT Split Repository directory preserving *move / renames* history

Tags:

git

Let's say you have the repository:

myCode/megaProject/moduleA
myCode/megaProject/moduleB

Over time (months), you re-organise the project. Refactoring the code to make the modules independent. Files in the megaProject directory get moved into their own directories. Emphasis on move - the history of these files is preserved.

myCode/megaProject
myCode/moduleA
myCode/moduleB

Now you wish to move these modules to their own GIT repos. Leaving the original with just megaProject on its own.

myCode/megaProject
newRepoA/moduleA
newRepoB/moduleB

The filter-branch command is documentated to do this but it doesn't follow history when files were moved outside of the target directory. So the history begins when the files were moved into their new directory, not the history the files had then they lived in the old megaProject directory.

How to split a GIT history based on a target directory, and, follow history outside of this path - leaving only commit history related to these files and nothing else?

The numerous other answers on SO focus on generally splitting apart the repo - but make no mention of splitting apart and following the move history.

like image 282
simbolo Avatar asked Jan 02 '16 17:01

simbolo


People also ask

Does Git MV lose history?

So far, we know that git doesn't lose the history of renamed files, but by default doesn't show it.

How do I move files between Git repositories?

Merge the files into the new repository B. Step 2: Go to that directory. Step 3: Create a remote connection to repository A as a branch in repository B. Step 4: Pull files and history from this branch (containing only the directory you want to move) into repository B.


2 Answers

This is a version based on @rksawyer's scripts, but it uses git-filter-repo instead. I found it was much easier to use and much much faster than git-filter-branch (and is now recommended by git as a replacement).

# This script should run in the same folder as the project folder is.
# This script uses git-filter-repo (https://github.com/newren/git-filter-repo).
# The list of files and folders that you want to keep should be named <your_repo_folder_name>_KEEP.txt. I should contain a line end in the last line, otherwise the last file/folder will be skipped.
# The result will be the folder called <your_repo_folder_name>_REWRITE_CLONE. Your original repo won't be changed.
# Tags are not preserved, see line below to preserve tags.
# Running subsequent times will backup the last run in <your_repo_folder_name>_REWRITE_CLONE_BKP.

# Define here the name of the folder containing the repo: 
GIT_REPO="git-test-orig"

clone="$GIT_REPO"_REWRITE_CLONE
temp=/tmp/git_rewrite_temp
rm -Rf "$clone"_BKP
mv "$clone" "$clone"_BKP
rm -Rf "$temp"
mkdir "$temp"
git clone "$GIT_REPO" "$clone"
cd "$clone"
git remote remove origin
open .
open "$temp"

# Comment line below to preserve tags
git tag | xargs git tag -d

echo 'Start logging file history...'
echo "# git log results:\n" > "$temp"/log.txt

while read p
do
    shopt -s dotglob
    find "$p" -type f > "$temp"/temp
    while read f
    do
        echo "## " "$f" >> "$temp"/log.txt
        # print every file and follow to get any previous renames
        # Then remove blank lines.  Then remove every other line to end up with the list of filenames       
        git log --pretty=format:'%H' --name-only --follow -- "$f" | awk 'NF > 0' | awk 'NR%2==0' | tee -a "$temp"/log.txt
        
        echo "\n\n" >> "$temp"/log.txt
    done < "$temp"/temp
done < ../"$GIT_REPO"_KEEP.txt > "$temp"/PRESERVE

mv "$temp"/PRESERVE "$temp"/PRESERVE_full
awk '!a[$0]++' "$temp"/PRESERVE_full > "$temp"/PRESERVE

sort -o "$temp"/PRESERVE "$temp"/PRESERVE

echo 'Starting filter-branch --------------------------'
git filter-repo --paths-from-file "$temp"/PRESERVE --force --replace-refs delete-no-add
echo 'Finished filter-branch --------------------------'

It logs the result of git log into a file in /tmp/git_rewrite_temp/log.txt, so you can get rid of these lines if you don't need a log.txt and want it to run faster.

like image 74
Roberto Avatar answered Nov 04 '22 16:11

Roberto


Running git filter-branch --subdirectory-filter in your cloned repository will remove all commits that don't affect content in that subdirectory, which includes those affecting the files before they were moved.

Instead, you need to use the --index-filter flag with a script to delete all files you're not interested in, and the --prune-empty flag to ignore any commits affecting other content.

There's a blog post from Kevin Deldycke with a good example of this:

git filter-branch --prune-empty --tree-filter 'find ./ -maxdepth 1 -not -path "./e107*" -and -not -path "./wordpress-e107*" -and -not -path "./.git" -and -not -path "./" -print -exec rm -rf "{}" \;' -- --all

This command effectively checks out each commit in turn, deletes all uninteresting files from the working directory and, if anything has changed from the last commit then it checks it in (rewriting the history as it goes). You would need to tweak that command to delete all files except those in, say, /moduleA, /megaProject/moduleA and the specific files you want to keep from /megaProject.

like image 36
Matthew Strawbridge Avatar answered Nov 04 '22 16:11

Matthew Strawbridge