Is there a way to add a subdirectory of a remote repository into a subdirectory of my repository with git-subtree?
Suppose I have this main repository:
/
dir1
dir2
And this library repository:
/
libdir
some-file
some-file-to-be-ignored
I want to import library/libdir into main/dir1 so that it looks like this:
/
dir1
some-file
dir2
Using git-subtree, I can specify to import into dir1 with the --prefix
argument, but can I also specify to only take the contents of a specific directory in the subtree?
The reason for using git-subtree is that I can later synchronize the two repositories.
I've been experimenting with this, and found some partial solutions, though none are quite perfect.
For these examples, I'll consider merging the four files from contrib/completion/
of https://github.com/git/git.git into third_party/git_completion/
of the local repository.
This is probably the best way I've found. I only tested one-way merging; I haven't tried sending changes back to the upstream repository.
# Do this the first time:
$ git remote add -f -t master --no-tags gitgit https://github.com/git/git.git
# The next line is optional. Without it, the upstream commits get
# squashed; with it they will be included in your local history.
$ git merge -s ours --no-commit gitgit/master
# The trailing slash is important here!
$ git read-tree --prefix=third_party/git-completion/ -u gitgit/master:contrib/completion
$ git commit
# In future, you can merge in additional changes as follows:
# The next line is optional. Without it, the upstream commits get
# squashed; with it they will be included in your local history.
$ git merge -s ours --no-commit gitgit/master
# Replace the SHA1 below with the commit hash that you most recently
# merged in using this technique (i.e. the most recent commit on
# gitgit/master at the time).
$ git diff --color=never 53e53c7c81ce2c7c4cd45f95bc095b274cb28b76:contrib/completion gitgit/master:contrib/completion | git apply -3 --directory=third_party/git-completion
# Now fix any conflicts if you'd modified third_party/git-completion.
$ git commit
Since it's awkward having to remember the most recent commit SHA1 that you merged from the upstream repository, I've written this Bash function which does all the hard work for you (grabbing it from git log):
git-merge-subpath() {
local SQUASH
if [[ $1 == "--squash" ]]; then
SQUASH=1
shift
fi
if (( $# != 3 )); then
local PARAMS="[--squash] SOURCE_COMMIT SOURCE_PREFIX DEST_PREFIX"
echo "USAGE: ${FUNCNAME[0]} $PARAMS"
return 1
fi
# Friendly parameter names; strip any trailing slashes from prefixes.
local SOURCE_COMMIT="$1" SOURCE_PREFIX="${2%/}" DEST_PREFIX="${3%/}"
local SOURCE_SHA1
SOURCE_SHA1=$(git rev-parse --verify "$SOURCE_COMMIT^{commit}") || return 1
local OLD_SHA1
local GIT_ROOT=$(git rev-parse --show-toplevel)
if [[ -n "$(ls -A "$GIT_ROOT/$DEST_PREFIX" 2> /dev/null)" ]]; then
# OLD_SHA1 will remain empty if there is no match.
local RE="^${FUNCNAME[0]}: [0-9a-f]{40} $SOURCE_PREFIX $DEST_PREFIX\$"
OLD_SHA1=$(git log -1 --format=%b -E --grep="$RE" \
| grep --color=never -E "$RE" | tail -1 | awk '{print $2}')
fi
local OLD_TREEISH
if [[ -n $OLD_SHA1 ]]; then
OLD_TREEISH="$OLD_SHA1:$SOURCE_PREFIX"
else
# This is the first time git-merge-subpath is run, so diff against the
# empty commit instead of the last commit created by git-merge-subpath.
OLD_TREEISH=$(git hash-object -t tree /dev/null)
fi &&
if [[ -z $SQUASH ]]; then
git merge -s ours --no-commit "$SOURCE_COMMIT"
fi &&
git diff --color=never "$OLD_TREEISH" "$SOURCE_COMMIT:$SOURCE_PREFIX" \
| git apply -3 --directory="$DEST_PREFIX" || git mergetool
if (( $? == 1 )); then
echo "Uh-oh! Try cleaning up with |git reset --merge|."
else
git commit -em "Merge $SOURCE_COMMIT:$SOURCE_PREFIX/ to $DEST_PREFIX/
# Feel free to edit the title and body above, but make sure to keep the
# ${FUNCNAME[0]}: line below intact, so ${FUNCNAME[0]} can find it
# again when grepping git log.
${FUNCNAME[0]}: $SOURCE_SHA1 $SOURCE_PREFIX $DEST_PREFIX"
fi
}
Use it like this:
# Do this the first time:
$ git remote add -f -t master --no-tags gitgit https://github.com/git/git.git
$ git-merge-subpath gitgit/master contrib/completion third_party/git-completion
# In future, you can merge in additional changes as follows:
$ git fetch gitgit
$ git-merge-subpath gitgit/master contrib/completion third_party/git-completion
# Now fix any conflicts if you'd modified third_party/git-completion.
If you're never going to make local changes to the merged in files, i.e. you're happy to always overwrite the local subdirectory with the latest version from upstream, then a similar but simpler approach is to use git read-tree
:
# Do this the first time:
$ git remote add -f -t master --no-tags gitgit https://github.com/git/git.git
# The next line is optional. Without it, the upstream commits get
# squashed; with it they will be included in your local history.
$ git merge -s ours --no-commit gitgit/master
$ git read-tree --prefix=third_party/git-completion/ -u gitgit/master:contrib/completion
$ git commit
# In future, you can *overwrite* with the latest changes as follows:
# As above, the next line is optional (affects squashing).
$ git merge -s ours --no-commit gitgit/master
$ git rm -rf third_party/git-completion
$ git read-tree --prefix=third_party/git-completion/ -u gitgit/master:contrib/completion
$ git commit
I found a blog post that claimed to be able to merge (without overwriting) using a similar technique, but it didn't work when I tried it.
I did actually find a solution that uses git subtree
, thanks to http://jrsmith3.github.io/merging-a-subdirectory-from-another-repo-via-git-subtree.html, but it's incredibly slow (each git subtree split
command below takes me 9 minutes for a 28 MB repo with 39000 commits on a dual Xeon X5675, whereas the other solutions I found take less than a second).
If you can live with the slowness, it should be workable:
# Do this the first time:
$ git remote add -f -t master --no-tags gitgit https://github.com/git/git.git
$ git checkout gitgit/master
$ git subtree split -P contrib/completion -b temporary-split-branch
$ git checkout master
$ git subtree add --squash -P third_party/git-completion temporary-split-branch
$ git branch -D temporary-split-branch
# In future, you can merge in additional changes as follows:
$ git checkout gitgit/master
$ git subtree split -P contrib/completion -b temporary-split-branch
$ git checkout master
$ git subtree merge --squash -P third_party/git-completion temporary-split-branch
# Now fix any conflicts if you'd modified third_party/git-completion.
$ git branch -D temporary-split-branch
Note that I pass in --squash
to avoid polluting the local repository with lots of commits, but you can remove --squash
if you'd prefer to preserve the commit history.
It's possible that subsequent splits can be made faster using --rejoin
(see https://stackoverflow.com/a/16139361/691281) - I didn't test that.
The OP clearly stated that they want to merge a subdirectory of an upstream repository into a subdirectory of the local repository. If however instead you want to merge an entire upstream repository into a subdirectory of your local repository, then there's a simpler, cleaner, and better supported alternative:
# Do this the first time:
$ git subtree add --squash --prefix=third_party/git https://github.com/git/git.git master
# In future, you can merge in additional changes as follows:
$ git subtree pull --squash --prefix=third_party/git https://github.com/git/git.git master
Or if you prefer to avoid repeating the repository URL, then you can add it as a remote:
# Do this the first time:
$ git remote add -f -t master --no-tags gitgit https://github.com/git/git.git
$ git subtree add --squash --prefix=third_party/git gitgit/master
# In future, you can merge in additional changes as follows:
$ git subtree pull --squash --prefix=third_party/git gitgit/master
# And you can push changes back upstream as follows:
$ git subtree push --prefix=third_party/git gitgit/master
# Or possibly (not sure what the difference is):
$ git subtree push --squash --prefix=third_party/git gitgit/master
See also:
A related technique is git submodules, but they come with annoying caveats (for example people who clone your repository won't clone the submodules unless they call git clone --recursive
), so I didn't investigate whether they can support subpaths.
Edit: git-subtrac (from the author of the earlier git-subtree) seems to solve some of the problems with git submodules. So this might be a good option for merging an entire upstream repository into a subdirectory, but it still doesn't appear to support including only a subdirectory of the upstream repository.
I was able to do something like this by adding :dirname
to the read-tree command. (note that I'm actually just trying to learn git and git-subtrees myself this week, and trying to setup an environment similar to how I had my projects in subversion using svn:externals -- my point being that there might be a better or easier way than the commands I'm showing here...)
So for example, using your example structure above:
git remote add library_remote _URL_TO_LIBRARY_REPO_
git fetch library_remote
git checkout -b library_branch library_remote/master
git checkout master
git read-tree --prefix=dir1 -u library_branch:libdir
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With