I've developed an application for one of my client. He already have one. So I need to convert his actual database (SQL Server), to the new one (MySQL).
Some tables of SQL Server has over then 10.000.000 records. When I initiate start developing this converter, I've started with some tables with a few records, so I find all records and save to my new MySQL database. I'll show you some code for better understanding (this is just an example)
<?php
namespace Converter\Model;
class PostConverter extends AbstractConverter
{
public function convert()
{
// this is the default connection, it is a mysql database (new application)
$em = $this->getEntityManager();
// this return an alternative connection to the sqlserver database (actual application)
$emAlternative = $this->getEntityManagerAlternative();
// instance of Converter\Repository\Post
$repository = $emAlternative->getRepository('Converter\Entity\Post');
$posts = $repository->findAll();
foreach ($posts as $post)
$post = new Post();
$post->setTitle($object->getTitle());
$em->persist($post);
}
$em->flush();
}
}
Now let's suppose that Post table has over then 10.000.000 records. I can't just find all and iterate over it. I'll get out of RAM. So I did something like this.
Repository class:
<?php
namespace Converter\Repository;
class Posts extends \Doctrine\ORM\EntityRepository
{
public function findPosts($limit, $offset)
{
$qb = $this->createQueryBuilder('Post');
$qb->setMaxResults($limit);
$qb->setFirstResult($offset);
return $qb->getQuery->getResult();
}
}
Here I find only a few posts at time, in the while loop. But it's kinda slow. I couldn't find a better solution to improve the performance
<?php
namespace Converter\Model;
class PostConverter extends AbstractConverter
{
public function convert()
{
$em = $this->getEntityManager();
$emAlternative = $this->getEntityManagerAlternative();
$repository = $emAlternative->getRepository('Converter\Entity\Post');
$limit = 1000;
while ($object = $repository->findPosts($limit, $offset) {
$post = new Post();
$post->setTitle($object->getTitle());
$em->persist($post);
$offset += $limit;
}
$em->flush();
}
}
I had never done something like this before. Maybe I'm going to a wrong way. I'll really appreciate if some of you could tell me the right one, so I can move on in this.
Thank you all
I can't just dump one to another. What I posted here is just an example, in the conversion I have to handle almost all data before insert in the new database. His actual application was developed in 2005. The database is not even normalized
I'm currently building a data warehousing system with similar issues. Doctrine's own documentation correctly states:
An ORM tool is not primarily well-suited for mass inserts, updates or deletions. Every RDBMS has its own, most effective way of dealing with such operations and if the options outlined below are not sufficient for your purposes we recommend you use the tools for your particular RDBMS for these bulk operations.
This is how I would handle it:
LIMIT
and primary key > last id. Querying using OFFSET
is often slower.gc_collect_cycles()
periodically. If your code is broken into objects this an easy way to keep memory under control.If the schema in the source database (MSSQL) and the target database (MySQL) are exact or similar, I would export the records from one database and then import them into the other using purely the database tools. Example:
You can use a shell script to glue all this together and automate the process.
This export/import will be reasonably fast, as it happens at the database layer. It's also the fastest you can probably get.
Moving an entire database from the model layer is going to be slower, by definition: you're going to create a model object for every row. That said, using the model layer is a good approach when the source and target schemas diverge, because then you can use the programmatic model to adapt one schema to another.
In your specific example, you may see some improvement in performance if you unset($object)
at the bottom of your while
loop, though I doubt memory is the bottleneck. (I/O is.)
I've tried this approach before and from my experience, it is always faster to utilize the DBMS native data dumping and restore tools rather than process records through a framework like this.
I would suggest using a utility such as bcp
(https://msdn.microsoft.com/en-us/library/aa337544.aspx) to dump the data out of SQL Server and then use MySQL's LOAD DATA
(http://dev.mysql.com/doc/refman/5.7/en/load-data.html) or mysqlimport
to bring the data into MySQL.
If you need to re-structure the data before it's loaded into MySQL, you could do that by setting up the new data structure in MySQL and then manipulating the data to be imported with a utility that can search and replace like sed
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With