I've found several simple examples using both filter-branch and subtree, but they always are just moving 1 directory around. I'd like to take the following repo:
/
Project1.sln
Project2.sln
Source/
CommonLib.Data/
CommonLib.Web/
Project1.Data/
Project1.Web/
Project1.Other/
Project2.Data/
Project2.Web/
And move things out to their own repos, with the following structure:
# CommonRepo
/
CommonLib.Data/
CommonLib.Web/
# Project1Repo
/
Project1.sln
Project1.Data/
Project1.Web/
Project1.Other/
# Project2Repo
/
Project2.sln
Project2.Data/
Project2.Web/
While maintaining the entire history. To complicate things, there are 1 or more branches of the original repo that correspond to each project, and thus the version of CommonLib the other projects referred to may be slightly different.
I'd like to use git subtree add to add a reference back to the CommonLib in each of the new repos at the correct tag/revision, but first I need a way to split several directories at once off into their own location.
git subtree split -P seems to only want 1 directory, and I haven't been able to get filter-branch to grab the multiples, either. I'm on a windows box so don't have all the scripting niceties set up to make this easier.
Any advice?
Using submodules One way out of the problem of large files is to use submodules, which enable you to manage one Git repository within another. You can create a submodule, which contains all your binary files, keeping the rest of the code separately in the parent repository, and update the submodule only when necessary.
So, your entire git content will be less than your actual source code size. But, even in that case, you keep on committing large files, your git repo size may increase due to the version history. You have to reduce your git repo size in order to work it seamlessly.
In the end, I recommend you keep the common lib included in your projects especially due to the divergence you spoke about, so your ideal structure should be:
# CommonRepo
/
CommonLib.Data/
CommonLib.Web/
# Project1Repo
/
Project1.sln
Project1.Data/
Project1.Web/
Project1.Other/
CommonLib/ # I recommend that you do whatever restructuring needed to support this in a sub-directory
CommonLib.Data/
CommonLib.Web/
# Project2Repo
/
Project2.sln
Project2.Data/
Project2.Web/
CommonLib/ # I recommend that you do whatever restructuring needed to support this in a sub-directory
CommonLib.Data/
CommonLib.Web/
when you split, as long as you don't use different annotations or something the commit ids will be compatible and should play nicely with merge. So you can start by extracting the CommonLib by itself.
I recommend you clone your whole depo before starting just to be sure you don't lose anything.
git clone <big-repo> <big-repo-clone>
Prepare the old repo
pushd <big-repo-clone>
# split for the common lib
git checkout master # assuming you want your common lib at master
git subtree split --prefix=Source --branch=temp-commonLib
# split the projects from their respective branches
git checkout <branch-for-project1>
git subtree split --prefix=Source --branch=temp-project1
# split the projects from their respective branches
git checkout <branch-for-project2>
git subtree split --prefix=Source --branch=temp-project2
Now we need to clean out the parts of those projects that we don't want there. Since they're mixed in you can't really use sub-tree but you can filter-branch to rewrite the history without the other parts.
# strip unrelated parts from the CommonLib
git checkout temp-commonLib
git filter-branch --tag-name-filter cat --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch Project1* Project2*' HEAD
# strip unrelated parts from the Project1
git checkout temp-project1
git filter-branch --tag-name-filter cat --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch CommonLib* Project2*' HEAD
# strip unrelated parts from the Project2
git checkout temp-project2
git filter-branch --tag-name-filter cat --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch CommonLib* Project1*' HEAD
The prune empty will strip the commits that become empty because they only contained changes that were in the folders you removed.
Note: All of these changes are at the /source level so that it can be the new root for each project. You can later add your solution back in. Or you can use this prune technique with clones instead of subtrees, and when you're all done you can just move all the contents from '/Source' to '/'
Now your is going to have extra branches and backups in refs/original/refs/heads/<branch-name>
. If during the process you get a fatal error with filter-branch, you can re-create the branch and start again, or if you're confident it didn't do anything yet you can delete this backup with: git update-ref -d refs/original/refs/heads/<branch-name>
.
Now just create new repos to store the projects created from those branches
popd # to get out of <big-repo-clone>
mkdir <new-repo>
pushd <new-repo>
git init
git pull <big-repo-clone> <name-of-branch> # like temp-project1
popd # to get out of the <new-repo>
One last thing, lets pull the CommonRepo into the projects.
pushd <new-project-repo>
git subtree add --prefix=CommonLib <new-commonlib-repo>
You then just need to bring in the .sln files (I'll leave this last step up to you).
My program git_filter can do the split for you. I wrote it because all the other solutions were very slow on our large repository. It is here:
https://github.com/slobobaby/git_filter
It creates multiple branches for each extract of the original repository. At the moment I have a test branch here:
https://github.com/slobobaby/git_filter/tree/subdir
which will create a new branch containing a subdirectory of the original repository renamed to the root of the new.
It takes a few minutes to run compared to hours or days for the git-core based solutions.
There is a script included which then pushes these new branches to new clean repositories.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With