Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

svn to git migration: incomplete history

Tags:

git

svn

migration

I'm migrating my SVN repository according to this answer. Everything goes smooth except that in the end I do not have the complete history. I could track the source of the problem down to a "SVN move" which I made.

This is what I did: At the beginning my SVN repo had no trunk branches tags dirs. Somewhen I introduced them and moved everything into trunk (and then I created a branch, that's the reason I decided to introduce the new folder structure).

So after migrating the SVN repo to git only the history after the introduction of the new folder structure is available.

I reproduced this issue in a very simple scenario.

History in SVN: enter image description here

History in git: enter image description here

The zip which contains SVN repo / git repo:

https://www.dropbox.com/s/ecy54st05qah4up/svn_git_problem.zip?dl=0

Is there any way to fix this?

like image 875
OschtärEi Avatar asked Dec 11 '14 18:12

OschtärEi


1 Answers

When you specify --stdlayout, git svn clone will only pay attention to svn commits that modify files under paths /trunk, /branches or /tags - other commits will be ignored. You still end-up with a valid clone of your repository, but history from r1 through to the creation of your standard-layout will be lost, as you have observed. Since you want your Git repository to understand trunk, branches and tags following the layout change, you still want --stdlayout, otherwise git clone will combine all branches into a single tree with /trunk /branches /tags, which is not what you want.

What you can do if you really care deeply about the pre-layout-change history (and if this is strictly a one-off migration, no SVN commits after moving to Git), is to run git svn clone twice - once with --stdlayout and once without. The stdlayout version will become your eventual repository, and the non-stdlayout version can be used during the migration only, to stitch the pre-layout-change history underneath the new layout, at the point of the reorganisation. This can be done by cherry-pick-ing all of your post-reorg commits onto a snapshot of the repository at the time of the re-org.

Once you clone both repos, you'll notice that at the point of the copy to the new layout, there's a common tree-hash for trunk. Here's an example (not using your repo sorry, so hashes differ):

(in stdlayout repo):

# git log --pretty=raw
commit 44f2f60e00117dfd51fd7d6431b697ec0ccc863d
tree 5cf62e006bb7b58171010fc0ffaba08ca97520da
parent d403c6ce0789cf584af9abb945bcfd88721e391e
author (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411603 +0000
committer (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411603 +0000

    change 4 after folder structure & branch

    git-svn-id: http://<redacted>/trunk@9 4ed80924-4846-11e4-8279-c5809b3f22e4

commit d403c6ce0789cf584af9abb945bcfd88721e391e
tree d6c0d6cf271be5146b26781ab9bd78736d86ace3
parent 0c5873eab204942ffe56370cc6e1d31e5372da13
author (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411513 +0000
committer (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411513 +0000

    changed: moved to new folder structure

    git-svn-id: http://<redacted>/trunk@7 4ed80924-4846-11e4-8279-c5809b3f22e4

commit 0c5873eab204942ffe56370cc6e1d31e5372da13
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
author (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411460 +0000
committer (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411460 +0000

    new folder structure

    git-svn-id: http://<redacted>/trunk@6 4ed80924-4846-11e4-8279-c5809b3f22e4

(in the full, non-stdlayout repo):

commit ec52fff6ee1d65eadfa1d18aa4b74b553fc693e1
tree cfda32eb39248fa5969d15a21d2f8014189e88c2
parent 685fe9961abfee4d4913e83cf5a4a7e8d459a1a1
author (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411603 +0000
committer (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411603 +0000

    change 4 after folder structure & branch

    git-svn-id: http://<redacted>@9 4ed80924-4846-11e4-8279-c5809b3f22e4

commit 685fe9961abfee4d4913e83cf5a4a7e8d459a1a1
tree 817306fad0ed5466d877437cdda12ff39a0df725
parent 02caf52174c588f1d394815201b764f9abdaa640
author (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411565 +0000
committer (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411565 +0000

    created new branch

    git-svn-id: http://<redacted>@8 4ed80924-4846-11e4-8279-c5809b3f22e4

commit 02caf52174c588f1d394815201b764f9abdaa640
tree c041405a580beaef0a4e50923e9279e179c917a8
parent 37d77b8f1168d00b943e0bca3cab277cf89e7e84
author (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411513 +0000
committer (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411513 +0000

    changed: moved to new folder structure

    git-svn-id: http://<redacted>@7 4ed80924-4846-11e4-8279-c5809b3f22e4

commit 37d77b8f1168d00b943e0bca3cab277cf89e7e84
tree d6c0d6cf271be5146b26781ab9bd78736d86ace3
parent 3a4784719bd95af5bf59de96310a1d6a38af562e
author (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411460 +0000
committer (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411460 +0000

    new folder structure

    git-svn-id: http://<redacted>@6 4ed80924-4846-11e4-8279-c5809b3f22e4

commit 3a4784719bd95af5bf59de96310a1d6a38af562e
tree d6c0d6cf271be5146b26781ab9bd78736d86ace3
parent 2fb41dab5a7389ab32419b8b270d955631aaaefa
author (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411421 +0000
committer (no author) <(no author)@4ed80924-4846-11e4-8279-c5809b3f22e4> 1420411421 +0000

    update 4

... etc., continues.

Note that in the stdlayout repo, commit d403c6c is tree d6c0d6c, and in the full clone, both commits 37d77b8 and its parent 3a47847 also have that same tree. You might think this odd at first, until you realize that the creation of /trunk /branches /tags in the git-svn clone is actually a no-op, since Git does not track empty directories.

In the stdlayout repo, you can import the full-clone non-standard clone:

# git remote add fullclone ../fullclone
# git fetch fullclone

Then checkout a new branch from the "new folder structure" commit from the full repo:

# git checkout -b fix-history 37d77b8

Then replay all commits from this point forward in the standard-layout repo:

# git cherry-pick d403c6c..master

This might take a while to run if you have many post re-org commits, as each commit is re-committed on the new branch. The result should be a stitched-together history of trunk pre and post the reorganisation.

like image 157
javabrett Avatar answered Sep 29 '22 12:09

javabrett