Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I split a git repository? [duplicate]

I have a Git repository which contains a number of subdirectories. Now I have found that one of the subdirectories is unrelated to the other and should be detached to a separate repository.

How can I do this while keeping the history of the files within the subdirectory?

I guess I could make a clone and remove the unwanted parts of each clone, but I suppose this would give me the complete tree when checking out an older revision etc. This might be acceptable, but I would prefer to be able to pretend that the two repositories doesn't have a shared history.

Just to make it clear, I have the following structure:

XYZ/
    .git/
    XY1/
    ABC/
    XY2/

But I would like this instead:

XYZ/
    .git/
    XY1/
    XY2/
ABC/
    .git/
    ABC/
like image 561
matli Avatar asked Dec 11 '08 13:12

matli


3 Answers

The Easy Way™

It turns out that this is such a common and useful practice that the overlords of Git made it really easy, but you have to have a newer version of Git (>= 1.7.11 May 2012). See the appendix for how to install the latest Git. Also, there's a real-world example in the walkthrough below.

  1. Prepare the old repo

     cd <big-repo>
     git subtree split -P <name-of-folder> -b <name-of-new-branch>
    

Note: <name-of-folder> must NOT contain leading or trailing characters. For instance, the folder named subproject MUST be passed as subproject, NOT ./subproject/

Note for Windows users: When your folder depth is > 1, <name-of-folder> must have *nix style folder separator (/). For instance, the folder named path1\path2\subproject MUST be passed as path1/path2/subproject

  1. Create the new repo

     mkdir ~/<new-repo> && cd ~/<new-repo>
     git init
     git pull </path/to/big-repo> <name-of-new-branch>
    
  2. Link the new repo to GitHub or wherever

     git remote add origin <[email protected]:user/new-repo.git>
     git push -u origin master
    
  3. Cleanup inside <big-repo>, if desired

     git rm -rf <name-of-folder>
    

Note: This leaves all the historical references in the repository. See the Appendix below if you're actually concerned about having committed a password or you need to decreasing the file size of your .git folder.


Walkthrough

These are the same steps as above, but following my exact steps for my repository instead of using <meta-named-things>.

Here's a project I have for implementing JavaScript browser modules in node:

tree ~/node-browser-compat

node-browser-compat
├── ArrayBuffer
├── Audio
├── Blob
├── FormData
├── atob
├── btoa
├── location
└── navigator

I want to split out a single folder, btoa, into a separate Git repository

cd ~/node-browser-compat/
git subtree split -P btoa -b btoa-only

I now have a new branch, btoa-only, that only has commits for btoa and I want to create a new repository.

mkdir ~/btoa/ && cd ~/btoa/
git init
git pull ~/node-browser-compat btoa-only

Next, I create a new repo on GitHub or Bitbucket, or whatever and add it as the origin

git remote add origin [email protected]:node-browser-compat/btoa.git
git push -u origin master

Happy day!

Note: If you created a repo with a README.md, .gitignore and LICENSE, you will need to pull first:

git pull origin master
git push origin master

Lastly, I'll want to remove the folder from the bigger repo

git rm -rf btoa

Appendix

Latest Git on macOS

To get the latest version of Git using Homebrew:

brew install git

Latest Git on Ubuntu

sudo apt-get update
sudo apt-get install git
git --version

If that doesn't work (you have a very old version of Ubuntu), try

sudo add-apt-repository ppa:git-core/ppa
sudo apt-get update
sudo apt-get install git

If that still doesn't work, try

sudo chmod +x /usr/share/doc/git/contrib/subtree/git-subtree.sh
sudo ln -s \
/usr/share/doc/git/contrib/subtree/git-subtree.sh \
/usr/lib/git-core/git-subtree

Thanks to rui.araujo from the comments.

Clearing your history

By default removing files from Git doesn't actually remove them, it just commits that they aren't there anymore. If you want to actually remove the historical references (i.e. you committed a password), you need to do this:

git filter-branch --prune-empty --tree-filter 'rm -rf <name-of-folder>' HEAD

After that, you can check that your file or folder no longer shows up in the Git history at all

git log -- <name-of-folder> # should show nothing

However, you can't "push" deletes to GitHub and the like. If you try, you'll get an error and you'll have to git pull before you can git push - and then you're back to having everything in your history.

So if you want to delete history from the "origin" - meaning to delete it from GitHub, Bitbucket, etc - you'll need to delete the repo and re-push a pruned copy of the repo. But wait - there's more! - if you're really concerned about getting rid of a password or something like that you'll need to prune the backup (see below).

Making .git smaller

The aforementioned delete history command still leaves behind a bunch of backup files - because Git is all too kind in helping you to not ruin your repo by accident. It will eventually delete orphaned files over the days and months, but it leaves them there for a while in case you realize that you accidentally deleted something you didn't want to.

So if you really want to empty the trash to reduce the clone size of a repo immediately you have to do all of this really weird stuff:

rm -rf .git/refs/original/ && \
git reflog expire --all && \
git gc --aggressive --prune=now

git reflog expire --all --expire-unreachable=0
git repack -A -d
git prune

That said, I'd recommend not performing these steps unless you know that you need to - just in case you did prune the wrong subdirectory, y'know? The backup files shouldn't get cloned when you push the repo, they'll just be in your local copy.

Credit

  • http://psionides.eu/2010/02/04/sharing-code-between-projects-with-git-subtree/
  • Remove a directory permanently from git
  • http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/
  • How to remove unreferenced blobs from my git repo
like image 53
coolaj86 Avatar answered Oct 06 '22 18:10

coolaj86


Update: This process is so common, that the git team made it much simpler with a new tool, git subtree. See here: Detach (move) subdirectory into separate Git repository


You want to clone your repository and then use git filter-branch to mark everything but the subdirectory you want in your new repo to be garbage-collected.

  1. To clone your local repository:

    git clone /XYZ /ABC
    

    (Note: the repository will be cloned using hard-links, but that is not a problem since the hard-linked files will not be modified in themselves - new ones will be created.)

  2. Now, let us preserve the interesting branches which we want to rewrite as well, and then remove the origin to avoid pushing there and to make sure that old commits will not be referenced by the origin:

    cd /ABC
    for i in branch1 br2 br3; do git branch -t $i origin/$i; done
    git remote rm origin
    

    or for all remote branches:

    cd /ABC
    for i in $(git branch -r | sed "s/.*origin\///"); do git branch -t $i origin/$i; done
    git remote rm origin
    
  3. Now you might want to also remove tags which have no relation with the subproject; you can also do that later, but you might need to prune your repo again. I did not do so and got a WARNING: Ref 'refs/tags/v0.1' is unchanged for all tags (since they were all unrelated to the subproject); additionally, after removing such tags more space will be reclaimed. Apparently git filter-branch should be able to rewrite other tags, but I could not verify this. If you want to remove all tags, use git tag -l | xargs git tag -d.

  4. Then use filter-branch and reset to exclude the other files, so they can be pruned. Let's also add --tag-name-filter cat --prune-empty to remove empty commits and to rewrite tags (note that this will have to strip their signature):

    git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter ABC -- --all
    

    or alternatively, to only rewrite the HEAD branch and ignore tags and other branches:

    git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter ABC HEAD
    
  5. Then delete the backup reflogs so the space can be truly reclaimed (although now the operation is destructive)

    git reset --hard
    git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
    git reflog expire --expire=now --all
    git gc --aggressive --prune=now
    

    and now you have a local git repository of the ABC sub-directory with all its history preserved.

Note: For most uses, git filter-branch should indeed have the added parameter -- --all. Yes that's really --space-- all. This needs to be the last parameters for the command. As Matli discovered, this keeps the project branches and tags included in the new repo.

Edit: various suggestions from comments below were incorporated to make sure, for instance, that the repository is actually shrunk (which was not always the case before).

like image 32
17 revs, 12 users 35% Avatar answered Oct 06 '22 19:10

17 revs, 12 users 35%


When running git filter-branch using a newer version of git (2.22+ maybe?), it says to use this new tool git-filter-repo. This tool certainly simplified things for me.

Filtering with filter-repo

Commands to create the XYZ repo from the original question:

# create local clone of original repo in directory XYZ
tmp $ git clone [email protected]:user/original.git XYZ

# switch to working in XYZ
tmp $ cd XYZ

# keep subdirectories XY1 and XY2 (dropping ABC)
XYZ $ git filter-repo --path XY1 --path XY2

# note: original remote origin was dropped
# (protecting against accidental pushes overwriting original repo data)

# XYZ $ ls -1
# XY1
# XY2

# XYZ $ git log --oneline
# last commit modifying ./XY1 or ./XY2
# first commit modifying ./XY1 or ./XY2

# point at new hosted, dedicated repo
XYZ $ git remote add origin [email protected]:user/XYZ.git

# push (and track) remote master
XYZ $ git push -u origin master

assumptions: * remote XYZ repo was new and empty before the push

Filtering and moving

In my case, I also wanted to move a couple of directories for a more consistent structure. Initially, I ran that simple filter-repo command followed by git mv dir-to-rename, but I found I could get a slightly "better" history using the --path-rename option. Instead of seeing last modified 5 hours ago on moved files in the new repo I now see last year (in the GitHub UI), which matches the modified times in the original repo.

Instead of...

git filter-repo --path XY1 --path XY2 --path inconsistent
git mv inconsistent XY3  # which updates last modification time

I ultimately ran...

git filter-repo --path XY1 --path XY2 --path inconsistent --path-rename inconsistent:XY3
Notes:
  • I thought the Git Rev News blog post explained well the reasoning behind creating yet another repo-filtering tool.
  • I initially tried the path of creating a sub-directory matching the target repo name in the original repository and then filtering (using git filter-repo --subdirectory-filter dir-matching-new-repo-name). That command correctly converted that subdirectory to the root of the copied local repo, but it also resulted in a history of only the three commits it took to create the subdirectory. (I hadn't realized that --path could be specified multiple times; thereby, obviating the need to create a subdirectory in the source repo.) Since someone had committed to the source repo by the time I noticed that I'd failed to carry forward the history, I just used git reset commit-before-subdir-move --hard after the clone command, and added --force to the filter-repo command to get it to operate on the slightly modified local clone.
git clone ...
git reset HEAD~7 --hard      # roll back before mistake
git filter-repo ... --force  # tell filter-repo the alterations are expected
  • I was stumped on the install since I was unaware of the extension pattern with git, but ultimately I cloned git-filter-repo and symlinked it to $(git --exec-path):
ln -s ~/github/newren/git-filter-repo/git-filter-repo $(git --exec-path)
like image 35
lpearson Avatar answered Oct 06 '22 20:10

lpearson