Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a git repository and follow directory renames?

I currently have a big git repository that contains many projects, each one in its own subdirectory. I need to split it into individual repositories, each project in its own repo.

I tried git filter-branch --prune-empty --subdirectory-filter PROJECT master

However, many project directories went through several renames in their lives, and git filter-branch does not follow renames, so effectively the extracted repo does not have any history prior to the last rename.

How can I effectively extract a subdirectory from one big git repo, and follow all that directory's renames back into the past?

like image 775
haimg Avatar asked Feb 07 '13 19:02

haimg


People also ask

How do I move a directory from one GitHub repository to another?

Merge the files into the new repository B. Step 2: Go to that directory. Step 3: Create a remote connection to repository A as a branch in repository B. Step 4: Pull files and history from this branch (containing only the directory you want to move) into repository B.


Video Answer


2 Answers

Thanks to @Chronial, I was able to cook a script to massage my git repo according to my needs:

git filter-branch --prune-empty --index-filter '     # Delete files which are NOT needed     git ls-files -z | egrep -zv  "^(NAME1|NAME2|NAME3)" |          xargs -0 -r git rm --cached -q                  # Move files to root directory     git ls-files -s | sed -e "s-\t\(NAME1\|NAME2\|NAME3\)/-\t-" |         GIT_INDEX_FILE=$GIT_INDEX_FILE.new \         git update-index --index-info &&         ( test ! -f "$GIT_INDEX_FILE.new" \             || mv -f "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE" ) ' 

Basically what this does is this:

  1. Deletes all files outside of the three directories NAME1, NAME2 or NAME3 that I need (one project was renamed NAME1 -> NAME2 -> NAME3 during its lifetime).

  2. Moves everything inside these three directories to the root of the repository.

  3. I needed to test if "$GIT_INDEX_FILE.new" exists since import of svn into git creates commits without any files (directory-only commits). Needed only if the repo was created with 'git svn clone' initially.

like image 131
haimg Avatar answered Oct 01 '22 08:10

haimg


I had a very large repository from which I needed to extract a single folder; even --index-filter was predicted to take 8 hours to finish. Here's what I did instead:

  1. Obtain a list of all the past names of the folder. In my case there were only two, old-name and new-name.
  2. For each name:

    $ git checkout master $ git checkout -b filter-old-name $ git filter-branch --subdirectory-filter old-name 

    This will give you several disconnected branches, each containing history for one of the names.

  3. The filter-old-name branch should end with the commit which renamed the folder, and the filter-new-name branch should begin with the same commit. (The same applies if there was more than one rename: you'll wind up with an equivalent number of branches, each with a commit shared with the next one along.) One should delete everything and the other should recreate it again. Make sure that these two commits have identical contents; if they don't, the file was modified in addition to being renamed, and you will need to merge the changes. (In my case I didn't have this problem so I don't know how to solve it.)

    An easy way to check this is to try rebasing filter-new-name on top of filter-old-name and then squashing the two commits together: git should complain that this produces an empty commit. (Note that you will want to do this on a spare branch and then delete it: rebasing deletes the Committer information from the commits, thus losing some of the history you want to keep.)

  4. The next step is to graft the two branches together, skipping the two commits which renamed the folder. (Otherwise there will be a weird jump where everything is deleted and recreated.) This involves finding the full SHA (all 40 characters!) of the two commits and putting them into git's info, with the new name branch's commit first, and the old name branch's commit second.

    $ echo $NEW_NAME_SECOND_COMMIT_SHA1 $OLD_NAME_PENULTIMATE_COMMIT_SHA1 >> .git/info/grafts 

    If you've done this right, git log --graph should now show a line from the end of the new history to the start of the old history.

  5. This graft is currently temporary: it is not yet part of the history, and won't follow along with clones or pushes. To make it permanent:

    $ git filter-branch 

    This will refilter the branch without trying to make any further changes, making the graft permanent (changing all of the commits in the filter-new-name branch). You should now be able to delete the .git/info/grafts file.

At the end of all of this, you should now have on the filter-new-name branch all of the history from both names for the folder. You can then use this separate repository, or merge it into another one, or whatever you'd like to do with this history.

like image 24
Wolfgang Avatar answered Oct 01 '22 07:10

Wolfgang