Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intense Restructure of large Git repo into multiple new repos

I've found several simple examples using both filter-branch and subtree, but they always are just moving 1 directory around. I'd like to take the following repo:

/
  Project1.sln
  Project2.sln
  Source/
    CommonLib.Data/
    CommonLib.Web/
    Project1.Data/
    Project1.Web/
    Project1.Other/
    Project2.Data/
    Project2.Web/

And move things out to their own repos, with the following structure:

# CommonRepo
/
  CommonLib.Data/
  CommonLib.Web/

# Project1Repo
/
  Project1.sln
  Project1.Data/
  Project1.Web/
  Project1.Other/

# Project2Repo
/
  Project2.sln
  Project2.Data/
  Project2.Web/

While maintaining the entire history. To complicate things, there are 1 or more branches of the original repo that correspond to each project, and thus the version of CommonLib the other projects referred to may be slightly different.

I'd like to use git subtree add to add a reference back to the CommonLib in each of the new repos at the correct tag/revision, but first I need a way to split several directories at once off into their own location.

git subtree split -P seems to only want 1 directory, and I haven't been able to get filter-branch to grab the multiples, either. I'm on a windows box so don't have all the scripting niceties set up to make this easier.

Any advice?

like image 737
Chip Paul Avatar asked Mar 06 '14 21:03

Chip Paul


People also ask

How do I manage large Git repository?

Using submodules One way out of the problem of large files is to use submodules, which enable you to manage one Git repository within another. You can create a submodule, which contains all your binary files, keeping the rest of the code separately in the parent repository, and update the submodule only when necessary.

Why is my Git repo so large?

So, your entire git content will be less than your actual source code size. But, even in that case, you keep on committing large files, your git repo size may increase due to the version history. You have to reduce your git repo size in order to work it seamlessly.


2 Answers

In the end, I recommend you keep the common lib included in your projects especially due to the divergence you spoke about, so your ideal structure should be:

# CommonRepo
/
  CommonLib.Data/
  CommonLib.Web/

# Project1Repo
/
  Project1.sln
  Project1.Data/
  Project1.Web/
  Project1.Other/
  CommonLib/         # I recommend that you do whatever restructuring needed to support this in a sub-directory
    CommonLib.Data/
    CommonLib.Web/

# Project2Repo
/
  Project2.sln
  Project2.Data/
  Project2.Web/
  CommonLib/         # I recommend that you do whatever restructuring needed to support this in a sub-directory
    CommonLib.Data/
    CommonLib.Web/

Now to handle the splitting:

when you split, as long as you don't use different annotations or something the commit ids will be compatible and should play nicely with merge. So you can start by extracting the CommonLib by itself.

  1. I recommend you clone your whole depo before starting just to be sure you don't lose anything.

    git clone <big-repo> <big-repo-clone>
    
  2. Prepare the old repo

    pushd <big-repo-clone>
    # split for the common lib
    git checkout master  # assuming you want your common lib at master
    git subtree split --prefix=Source --branch=temp-commonLib
    
    # split the projects from their respective branches
    git checkout <branch-for-project1>
    git subtree split --prefix=Source --branch=temp-project1
    
    # split the projects from their respective branches
    git checkout <branch-for-project2>
    git subtree split --prefix=Source --branch=temp-project2
    
  3. Now we need to clean out the parts of those projects that we don't want there. Since they're mixed in you can't really use sub-tree but you can filter-branch to rewrite the history without the other parts.

    # strip unrelated parts from the CommonLib
    git checkout temp-commonLib
    git filter-branch --tag-name-filter cat --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch Project1* Project2*' HEAD
    
    # strip unrelated parts from the Project1
    git checkout temp-project1
    git filter-branch --tag-name-filter cat --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch CommonLib* Project2*' HEAD
    
    # strip unrelated parts from the Project2
    git checkout temp-project2
    git filter-branch --tag-name-filter cat --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch CommonLib* Project1*' HEAD
    

    The prune empty will strip the commits that become empty because they only contained changes that were in the folders you removed.

    Note: All of these changes are at the /source level so that it can be the new root for each project. You can later add your solution back in. Or you can use this prune technique with clones instead of subtrees, and when you're all done you can just move all the contents from '/Source' to '/'

    Now your is going to have extra branches and backups in refs/original/refs/heads/<branch-name>. If during the process you get a fatal error with filter-branch, you can re-create the branch and start again, or if you're confident it didn't do anything yet you can delete this backup with: git update-ref -d refs/original/refs/heads/<branch-name>.

  4. Now just create new repos to store the projects created from those branches

    popd # to get out of <big-repo-clone>
    
    mkdir <new-repo>
    pushd <new-repo>
    
    git init
    git pull <big-repo-clone> <name-of-branch> # like temp-project1
    popd # to get out of the <new-repo>
    
  5. One last thing, lets pull the CommonRepo into the projects.

    pushd <new-project-repo>
    git subtree add --prefix=CommonLib <new-commonlib-repo>
    

You then just need to bring in the .sln files (I'll leave this last step up to you).

like image 164
johnb003 Avatar answered Sep 19 '22 17:09

johnb003


My program git_filter can do the split for you. I wrote it because all the other solutions were very slow on our large repository. It is here:

https://github.com/slobobaby/git_filter

It creates multiple branches for each extract of the original repository. At the moment I have a test branch here:

https://github.com/slobobaby/git_filter/tree/subdir

which will create a new branch containing a subdirectory of the original repository renamed to the root of the new.

It takes a few minutes to run compared to hours or days for the git-core based solutions.

There is a script included which then pushes these new branches to new clean repositories.

like image 33
slobobaby Avatar answered Sep 21 '22 17:09

slobobaby