Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert SQL Server tables to MySQL using ZF2 and Doctrine2

I've developed an application for one of my client. He already have one. So I need to convert his actual database (SQL Server), to the new one (MySQL).

Some tables of SQL Server has over then 10.000.000 records. When I initiate start developing this converter, I've started with some tables with a few records, so I find all records and save to my new MySQL database. I'll show you some code for better understanding (this is just an example)

<?php

namespace Converter\Model;

class PostConverter extends AbstractConverter 
{

    public function convert() 
    {
        // this is the default connection, it is a mysql database (new application)
        $em = $this->getEntityManager();
        // this return an alternative connection to the sqlserver database (actual application)
        $emAlternative = $this->getEntityManagerAlternative();

        // instance of Converter\Repository\Post
        $repository = $emAlternative->getRepository('Converter\Entity\Post');

        $posts = $repository->findAll();

        foreach ($posts as $post)
            $post = new Post();
            $post->setTitle($object->getTitle());
            $em->persist($post);
        }  

        $em->flush();
    }
}

Now let's suppose that Post table has over then 10.000.000 records. I can't just find all and iterate over it. I'll get out of RAM. So I did something like this.

Repository class:

<?php

namespace Converter\Repository;

class Posts extends \Doctrine\ORM\EntityRepository
{

    public function findPosts($limit, $offset)
    {
        $qb = $this->createQueryBuilder('Post');

        $qb->setMaxResults($limit);
        $qb->setFirstResult($offset);

        return $qb->getQuery->getResult();
    }
}

Here I find only a few posts at time, in the while loop. But it's kinda slow. I couldn't find a better solution to improve the performance

<?php

namespace Converter\Model;

class PostConverter extends AbstractConverter 
{

    public function convert() 
    {
        $em = $this->getEntityManager();
        $emAlternative = $this->getEntityManagerAlternative();

        $repository = $emAlternative->getRepository('Converter\Entity\Post');

        $limit = 1000;

        while ($object = $repository->findPosts($limit, $offset) {
            $post = new Post();
            $post->setTitle($object->getTitle());
            $em->persist($post);

            $offset += $limit;
        }  

        $em->flush();
    }
}

I had never done something like this before. Maybe I'm going to a wrong way. I'll really appreciate if some of you could tell me the right one, so I can move on in this.

Thank you all


EDIT

I can't just dump one to another. What I posted here is just an example, in the conversion I have to handle almost all data before insert in the new database. His actual application was developed in 2005. The database is not even normalized

like image 648
Alfredo Costa Avatar asked Mar 22 '16 00:03

Alfredo Costa


3 Answers

I'm currently building a data warehousing system with similar issues. Doctrine's own documentation correctly states:

An ORM tool is not primarily well-suited for mass inserts, updates or deletions. Every RDBMS has its own, most effective way of dealing with such operations and if the options outlined below are not sufficient for your purposes we recommend you use the tools for your particular RDBMS for these bulk operations.

This is how I would handle it:

  • Create your empty MySQL database with Doctrine's tools.
  • Make a list of all indexes and primary keys in the MySQL database and drop them. I would script this. This will remove the overhead of constant index updates until your data migration is complete.
  • Write a script to copy the data. Loop through the SQL Server data in batches of a few thousand and insert into MySQL.
    • Use PDO or native libraries. No Doctrine or query builders. Write the queries by hand.
    • Open one connection to your SQL Server and one connection to MySQL. Keep them open for the duration of the script.
    • Query in batches using LIMIT and primary key > last id. Querying using OFFSET is often slower.
    • Prepare statements outside of loops to optimize query processing.
    • Wrap each batch of inserts in one transaction to reduce transactional overhead.
    • "Manually" check referential integrity if necessary. Your tables don't have primary keys yet.
    • If you have many tables, segment your code into objects or functions so local variables can be cleared from memory and it'll be easier to debug.
    • You might want to call gc_collect_cycles() periodically. If your code is broken into objects this an easy way to keep memory under control.
  • Recreate the database indexes and primary keys. Bonus points if these were scripted from the beginning. Watch for any primary keys that can't be created due to mistakes with duplicate data.
  • Test and test before opening your new MySQL database to production use. You don't want to write another script to fix data migrations later.
like image 154
Matt S Avatar answered Nov 06 '22 05:11

Matt S


If the schema in the source database (MSSQL) and the target database (MySQL) are exact or similar, I would export the records from one database and then import them into the other using purely the database tools. Example:

  1. In MSSQL, for every table, export the records to CSV
  2. In MySQL, for every table, import the records from CSV

You can use a shell script to glue all this together and automate the process.

This export/import will be reasonably fast, as it happens at the database layer. It's also the fastest you can probably get.

Moving an entire database from the model layer is going to be slower, by definition: you're going to create a model object for every row. That said, using the model layer is a good approach when the source and target schemas diverge, because then you can use the programmatic model to adapt one schema to another.

In your specific example, you may see some improvement in performance if you unset($object) at the bottom of your while loop, though I doubt memory is the bottleneck. (I/O is.)

like image 22
bishop Avatar answered Nov 06 '22 06:11

bishop


I've tried this approach before and from my experience, it is always faster to utilize the DBMS native data dumping and restore tools rather than process records through a framework like this.

I would suggest using a utility such as bcp (https://msdn.microsoft.com/en-us/library/aa337544.aspx) to dump the data out of SQL Server and then use MySQL's LOAD DATA (http://dev.mysql.com/doc/refman/5.7/en/load-data.html) or mysqlimport to bring the data into MySQL.

If you need to re-structure the data before it's loaded into MySQL, you could do that by setting up the new data structure in MySQL and then manipulating the data to be imported with a utility that can search and replace like sed.

like image 1
ski4404 Avatar answered Nov 06 '22 06:11

ski4404