I have a big repository, 100,000+ revisions with a very high branching factor. The initial fetch of the full SVN repository using git-svn has been running for around 2 months and it's only up to revision 60,000. Is there any way to speed this thing up?
I'm already regularly killing and restarting the fetch due to git-svn leaking memory like a sieve. The transfer is occurring over the local LAN, so link speed shouldn't be an issue. The repository is on a dedicated machine backed by dedicated fiber channel arrays so the server should have plenty of oomph. The only other thing that I can think of is do the clone from a local copy of the SVN repository.
What have other people done in similar circumstances?
SVN by nature is slow. Remember Git needs the entire history locally, so it checks out every revision from SVN. SVN doesn't pull the entire repository down, just a specific revision.
SVN has one central repository – which makes it easier for managers to have more of a top down approach to control, security, permissions, mirrors and dumps. Additionally, many say SVN is easier to use than Git.
This retrieves all the changes from the SVN repository and applies them on top of your local commits in your current branch. You can also use git svn fetch to retrieve the changes from the SVN repository but without applying them to your local branch.
You can clone a subversion repository to your machine using git svn clone <SVN repo URL> . The code will be available as a git repository. You can do your work there and make local commits as you please. There is a command line option to get a "shallow" checkout rather than the entire repository which is often useful.
At work I use git-svn against a ~170000 revision SVN repo. What I did was use git-svn init
+ git-svn fetch -r...
to limit my initial fetch to a reasonable number of revisions. You must be careful to choose a revision that is actually in the branch you want. Everything is fully functional even with truncated history except git-blame
, which obviously attributes all the lines older than your starting rev to the first rev.
You can further speed this up with ignore-paths to prune out subtrees that you don't want.
You can add more revisions later, but it will be painful. You will have to reset the rev-map (sadly I even wrote git-svn reset
and I can't say offhand if it will remove all revisions, so it may be by hand). Then git-svn fetch
more revisions and git-filter-branch
to reparent your old root to the new tree. That will rewrite every commit but it won't affect the source blobs themselves. You have to do similar surgery when people undertake big reorgs of the svn repo.
If you actually need all of the revisions (for example for a migration) then you should be looking at some flavor of svn-fast-export + git-fast-import. There may be one that adds rev tags to match git-svn, in which case you could fast-import and then just graft in the svn remote. Even if the existing svn-fast-export options don't have that feature, you can probably add it before your original clone completes!
Apparently there is no good answer. Some work is being done on git-fast-import but it isn't ready for prime time yet. They are still trying to figure out how to detect and represent 'svn cp' actions. The one bright spot is that someone on the list came up with an optimization for git-svn that seems to have made a big impact.
http://permalink.gmane.org/gmane.comp.version-control.git/168718
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With