Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle non-standard subversion import to Git

Tags:

git

svn

git-svn

We have a non-standard subversion respository that we would like to convert to Git. The problem is I don't really know where to start in order to make sure we keep the full history but don't end up with a complete mess.

Our repository has the last 6 years of history for our companies product suite and has gone through multiple restructurings. In all cases we have a core platform code base and then several project / plugins that combine in different ways on top of the core platform.

The first couple of years was structured like:

-- plugin1
   - trunk
   - branches
   - tags
-- pluginX
   - trunk
   - branches
   - tags
-- trunk   (core platform)
   - <various sub dirs)
-- branches  (various feature branches of the entire repository)
   - refactoring1
   - refactoringX
-- tags (various tags of customer releases of full respository)
   - customerX_1.x  
-- vendor  (vendor drops and tracking of 3rd party source deps)
   - 3rd_party_code_A
   - 3rd_party_code_X

Over time we added a couple more directories are the root including:

-- releases (replaced tags; branches for released stable versions of repos)
-- sandbox  (area for misc projects of interest; should have been new repo)

Then we cleaned this up and ended up with:

-- trunk
  - platform
  - plugin1
  - pluginX
-- stable  (stable release branches of trunk)
  - 1.1
  - 1.2
-- tags    (release points; marks a point on a stable branch)
  - 1.1.1
  - 1.1.2
-- vendor
-- sandbox
-- releases (copies of old releases of interest)

So that is our history. What we want to end up with is hopefully much cleaner. Right now we are thinking of the base of the git repository looking like this (basically a copy of the previous 'trunk' directory).

- platform
- plugin1
- pluginX 

Branches:
  - stable/1.1
  - stable/1.2
Tags:
  - rel/1.1.1
  - rel/1.1.2

We would like to put sandbox and vendor into their own repositories. (not sure how to do this, but maybe there is a way to import only a subset of an svn repository)

As far as branches and tags, we would want the code from 'stable' to end up as branches, the code from 'tags' to end up as tags into stable.

For the older history from the original structure, we would like to keep as much history as possible but don't want to pollute the new repository. For example if we could look back and see the changes that happened on the refactoring branches that would be great but not absolutely required.

Currently we are debating how to proceed and how to get everything restructured and imported in a clean way. The least we need is a way to have a full history of the platform and plugin code across both previous respository restructurings. If possible we would also like to get the stable and tag information from the most recent repository structure.

Does anyone have recommendations on how to do this import?

For example:

  • Is it possible to keep the full history across the restructurings?
  • Should we rewrite the subversion repository somehow to clean it up before import and if so how?
  • Should we import the full history and then restructure it in Git and how so?
  • Any ideas for how to make this import clean?
like image 855
Allen Avatar asked Oct 09 '22 21:10

Allen


1 Answers

Depending on your situation, git-svn (with the default --follow-parent option) might just do the trick as is. The first thing you should do is try a few git-svn runs, carefully spelling out the -T, -b, and -t options to help it with the directory structure.

You might run into trouble with a complicated directory structure history, though.

I recently was in a very similar situation, migrating my company's Subversion code to git, where the SVN history had gone through very similar restructuring to what you are describing. In my case, I wanted also to separate the projects from one Subversion repository to multiple Git repositories (one per project).

I was able to take the easy way out, deciding that it was not critical to migrate more than a few months of history, so for each project I determined what the earliest revision was that git-svn could handle gracefully, and then only fetched the history starting from there (using git-svn -r). Having handled previous VCS migrations (VSS to SVN, 2005), I knew from experience that long-term history is hardly ever referred to. In any case, it is easy to leave the old Subversion server running (in read-only mode), so that it can be used to look up things if necessary.

I don't know of any easy way to clean up Subversion's history, other than using svndumpfilter to exclude certain parts of it. If you are lucky, though, git-svn will magically do the right thing, and the history will actually look cleaner in git log than it ever did in svn log (due to the difference in how git looks at branches and tags).

In general, cleanliness and completeness of the history are two conflicting goals when doing a migration of this kind. Luckily, they are both really overrated - they both appeal to our sense of aesthetic more than being pragmatic necessities.

EDIT: Side tip for cleanliness: use the --prefix option on git-svn, to give the imported branches a unique prefix, since it is likely that you will have different branching conventions in git, and it makes it easy to view the svn history later.

like image 54
Avi Avatar answered Oct 13 '22 11:10

Avi