Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speeding up the initial git-svn fetch

Tags:

git

svn

git-svn

I have a big repository, 100,000+ revisions with a very high branching factor. The initial fetch of the full SVN repository using git-svn has been running for around 2 months and it's only up to revision 60,000. Is there any way to speed this thing up?

I'm already regularly killing and restarting the fetch due to git-svn leaking memory like a sieve. The transfer is occurring over the local LAN, so link speed shouldn't be an issue. The repository is on a dedicated machine backed by dedicated fiber channel arrays so the server should have plenty of oomph. The only other thing that I can think of is do the clone from a local copy of the SVN repository.

What have other people done in similar circumstances?

like image 629
MrEvil Avatar asked Oct 13 '10 00:10

MrEvil


People also ask

Why is Git SVN so slow?

SVN by nature is slow. Remember Git needs the entire history locally, so it checks out every revision from SVN. SVN doesn't pull the entire repository down, just a specific revision.

Is SVN easier than Git?

SVN has one central repository – which makes it easier for managers to have more of a top down approach to control, security, permissions, mirrors and dumps. Additionally, many say SVN is easier to use than Git.

What is Git SVN fetch?

This retrieves all the changes from the SVN repository and applies them on top of your local commits in your current branch. You can also use git svn fetch to retrieve the changes from the SVN repository but without applying them to your local branch.

Can I use Git and SVN at the same time?

You can clone a subversion repository to your machine using git svn clone <SVN repo URL> . The code will be available as a git repository. You can do your work there and make local commits as you please. There is a command line option to get a "shallow" checkout rather than the entire repository which is often useful.


2 Answers

At work I use git-svn against a ~170000 revision SVN repo. What I did was use git-svn init + git-svn fetch -r... to limit my initial fetch to a reasonable number of revisions. You must be careful to choose a revision that is actually in the branch you want. Everything is fully functional even with truncated history except git-blame, which obviously attributes all the lines older than your starting rev to the first rev.

You can further speed this up with ignore-paths to prune out subtrees that you don't want.

You can add more revisions later, but it will be painful. You will have to reset the rev-map (sadly I even wrote git-svn reset and I can't say offhand if it will remove all revisions, so it may be by hand). Then git-svn fetch more revisions and git-filter-branch to reparent your old root to the new tree. That will rewrite every commit but it won't affect the source blobs themselves. You have to do similar surgery when people undertake big reorgs of the svn repo.

If you actually need all of the revisions (for example for a migration) then you should be looking at some flavor of svn-fast-export + git-fast-import. There may be one that adds rev tags to match git-svn, in which case you could fast-import and then just graft in the svn remote. Even if the existing svn-fast-export options don't have that feature, you can probably add it before your original clone completes!

like image 171
Ben Jackson Avatar answered Sep 30 '22 09:09

Ben Jackson


Apparently there is no good answer. Some work is being done on git-fast-import but it isn't ready for prime time yet. They are still trying to figure out how to detect and represent 'svn cp' actions. The one bright spot is that someone on the list came up with an optimization for git-svn that seems to have made a big impact.

http://permalink.gmane.org/gmane.comp.version-control.git/168718

like image 36
MrEvil Avatar answered Sep 30 '22 08:09

MrEvil