Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to rewrite Git history so that all files are in a subdirectory?

I would like to merge multiple Git repositories (let's say repoA and repoB) into one new repository. The new repository (repoNew) should contain each repoA and repoB in a separate subdirectory. Since I have up to now only worked locally I can do whatever I want to the repositories.

Under those circumstances it seems the standard approach would be to use git filter-branch to rewrite the history of each repoA and repoB to make it seem as if they had always been in a subfolder, and then merge them into repoNew.

The first step is what is bothering me. I am well aware of SO answers such as in How can I rewrite history so that all files, except the ones I already moved, are in a subdirectory? (Dan Moulding's answer), which is exactly what I want.

He suggested something along the lines of the following:

git filter-branch --prune-empty --tree-filter '
if [[ ! -e repoA ]]; then
    mkdir -p repoA
    git ls-tree --name-only $GIT_COMMIT | xargs -i mv {} repoA
fi'

The result should be that the folder structure under <repoA-GIT-base> should now be in <repoA-GIT-base>/repoA. However, this is not the case. The above command fails randomly at different commits with a message like "mv: cannot move 'src' into 'repoA/src'

How can avoid those wrong commits when rewriting the history as described?

EDIT:

You should consider excluding the .gitignore from the move like so:

git filter-branch --prune-empty --tree-filter '
if [[ ! -e repoA ]]; then 
    mkdir -p repoA;
    git ls-tree --name-only $GIT_COMMIT | 
    grep -ve '^.gitignore$' | 
    xargs -i mv {} repoA; 
fi'

The command still fails seemingly at random. I tried it several times and the failure "unable to move" occured at different commits each time. I observed that when I excluded the .gitignore the chance of making it through all commits seemed to increase. I was able to consecutively perform the move on all of my three different repositories without failure. When I tried it again just for fun on another throw-away copy of one of the repositories it failed again.

Since I also had difficulty sometimes to delete my throw-away copies due to a process allegedly using some files, the problem could have something to do with Windows 7 file access handling, but I am not in a position to make serious assumptions there.

To keep trying until it succeeds is of course ridiculous and will probably not work on repositories with a lot of commits (mine only had ~30).

Info: I used git-bash with git version 1.7.10.msysgit.1 on Windows 7 64-Bit Enterprise.

like image 610
svenhuebner Avatar asked Mar 04 '14 11:03

svenhuebner


2 Answers

I suspect you're looking for something along the lines of git subhistory. It's a very small project and doesn't seem to be well maintained, but it's also designed to do almost exactly what you describe. Give it a try!

like image 124
Ajedi32 Avatar answered Oct 12 '22 09:10

Ajedi32


I have written a program based on libgit2 to filter git branches for another purpose which I changed slightly to do what you want here. You could try it.

It is in the subdir branch of git_filter at github:

https://github.com/slobobaby/git_filter/tree/subdir

I just tested it on our 100000 commit repository and it took 43 seconds.

I wrote the program because git filter-branch based solutions took days to weeks to finish.

The example configuration filters a "test" repository and puts everything in the "test" subdirectory - you can change this to do what you want.

like image 38
slobobaby Avatar answered Oct 12 '22 09:10

slobobaby