Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I export a large Perforce repository into a different version control system without losing the history?

At work we have a large Perforce repository (approx 40k changelists, total storage size ~145GB). We're generally happy with Perforce with only some mild gripes, but we're planning to go to a more distributed development model and as a result, would like to move to a more distributed version control system as well.

So far, I've looked at the usual suspects (git, mercurial and potentially bazaar as I have good experience with it) but our main hurdle currently is to get the version history out of Perforce and imported into the various DVCSs so we don't lose the history. We'd also prefer not to have the Perforce server hang around if we don't absolutely have to keep it - my experience with this sort of migration is that nobody looks at the old repo after a while so you'd be losing the history that way.

As there are multiple projects in the repository the idea is to split it into multiple DVCS projects when we're exporting the history as not everybody needs to be able to see every part of the history. However our biggest project still contains about 2/3rds of the committed revisions and also takes up approx 2/3rds of the storage. It also has the largest number of branches - probably around 30.

So far, I've tried the following - everything is on Windows as we're a Windows-only shop:

  • Import into Mercurial using the hg convert extension. This appears to work very well for the main branch of the project I'm converting, but attempting to convert the Perforce branches into named Mercurial branches using a branchmap still appears to produce a flat import with every checkin on the default branch. Maybe that's because I set the branch map up wrong, but hg help convert suggests that you can only turn a Perforce repo into a "flat" structure with no branches using this importer, which isn't really good enough for our use.
  • Import into Git using git-p4.py. Perforce documents using git as a distributed front end to Perforce and basing the close on the latest revision(s) of the repo does produce a usable git repo. Attempting to import the whole sub-project with branches breaks the importer as it runs out of memory, so I can't even tell if it manages to import our repo correctly.
  • I then had this brilliant brain fart of importing the Perforce repo into SVN with all the branches mapped to appropriate SVN branches as every version control system under the sun can import from SVN. This would be only using SVN as an intermediate step in the conversion, not as the target VCS - we wouldn't really gain anything from this conversion otherwise. Using p42svn.pl, that broke fairly early on in the process as our Perforce server didn't seem to like being hammered by the script that seems to make a new connection for every file/revision.
  • I haven't looked into exporting the history into Bazaar yet as it's a bit of an also-ran.

So, my questions are:

  • Is there a good tool besides p42svn.pl to export a Perforce repo into SVN? I don't mind using SVN as an intermediate repo as it seems to make exporting into all the DVCSs we're looking at reasonably easy.
  • Has anybody successfully exported branches from Perforce into Mercurial named branches and if so, how did you do it? The docs on the convert extension seem to be a bit sparse and I don't seem to be able to find a good/working way to do this.
like image 978
Timo Geusch Avatar asked Feb 28 '12 16:02

Timo Geusch


3 Answers

As you know switching source control systems is a huge task and one not to be taken lightly. There is considerable risk and downtime as 1) you make the actual transition and 2) then again as everyone re-tools and gets up-to-speed with the new system.

As you as still investigating your options, I would seriously take a breath and look into P4 Sandbox to see if that will meet your requirements.

More information about P4 Sandbox is below.

Overview
- P4Sandbox Feature Demo (Video)

Blog Posts
- P4Sandbox Private local branching, distributed development, and more
- P4Sandbox’s First Submit
- Distributed Development and P4Sandbox
- Private Branching with P4Sandbox
- Task-focused Work in P4Sandbox

Forum Discussion
- New Features Discussion on the official forums

like image 111
Dennis Avatar answered Sep 26 '22 14:09

Dennis


My word, your repository is really almost 200 Gigabytes in size? I feel sorry for the first fool who does a git pull to get a copy of the repository, and discover they're now downloading 150 gigabytes worth of data.

My suggestion: Don't bother with the entire history. All you really need are the active versions and branches. Think of this as an opportunity to toss out deadwood, and to restructure your repository.

I use to be an advocate of always getting as much history as possible, but one day I had to convert a StarTeam repository to ClearCase, and it just couldn't be done. The command line tools in StarTeam were poor, and the API just couldn't do what I need.

We simply downloaded the versions that customers had, the branches we were working on, and a few versions of the source. We kept our old StarTeam server up and running just in case someone might need to look at the source, but no one did.

However, if you do want to go through this, it really shouldn't be that bad. You could probably write a Python or Perl script to do the conversion for you.

Perforce tracks history via numbered changesets. Yes, each file has its own version number, but you really aren't too interested in that, you are more interested in the change sets.

If your P4 last changeset is 1,000, you could loop though changesets 1 to 1,000. Perforce sometimes skips a changeset, but that's pretty easy to detect. Each changeset has a date, the name of the person who made that commit, and their comment. With this information, you push your changes to the Git repository, and change the date, author, and comment of that commit.

By the way, since you're moving to Git, I hope you'll break up your repository into separate repos. And, if you committed built objects, remove them from the Perforce repository before you move them into Git. You should never store a built object in the repository -- especially if they're binary. They take up a lot of room, and become obsolete very quickly.

like image 30
David W. Avatar answered Sep 24 '22 14:09

David W.


We (I work at perforce) built a product do provide a git interface to the Perforce depot.

http://www.perforce.com/product/components/git-fusion

I used this internally for over a year, it's great, since you can try out the new DVCS approach (how many repos you want) with a "live" Perforce backend. I was the only team member using git while everyone else used p4 or p4v. Ergo, people could work using git and gradually decide upon your migration configuration.

There is support for mapping branches between the two systems: http://www.perforce.com/perforce/doc.current/manuals/git-fusion/index.html#chapter_dyn_ngj_3l.html#section_kkz_gqv_rl

I'm not sure if this solves all of the systems above, since I'm sure you can only go from git to X.

like image 23
Tristan Juricek Avatar answered Sep 23 '22 14:09

Tristan Juricek