Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Symfony2 / Doctrine make $statement->execute() not "buffer" all values

I've got a basic codeset like this (inside a controller):

$sql = 'select * from someLargeTable limit 1000';
$em = $this->getDoctrine()->getManager();
$conn = $em->getConnection();
$statement = $conn->prepare($sql);
$statement->execute();

My difficulty is that when the resultset is only a few records, the memory usage is not that bad. I echoed some debugging information before and after running the $statement->execute(); part of the code, and found for my implementation that I have the following:

pre-execute... rowCount :: 0 memory: 49.614 MB
post-execute... rowCount :: 1000 memory: 50.917 MB

When moving this up from 1000 records, to 10k the difference in MB usage grows to 13 MB

pre-execute... rowCount :: 0 memory: 49.614 MB
post-execute... rowCount :: 10000 memory: 62.521 MB

Eventually, retrieving around 50k records I get close to my maximum memory allocation:

pre-execute... rowCount :: 0 memory: 49.614 MB
post-execute... rowCount :: 50000 memory: 114.096 MB

With this implementation, there is no way I could write a controller (or even command for that matter) that will allow me to retrieve a CSV of data. Sure, 50k+ entries sounds a lot and the question begs why, but that's not the issue.

My ultimate question is: Is it possible to tell the DBAL/Connection or DBAL/Statement to, when executing, buffer the data inside SQL rather than in PHP in it's entirety. For instance, if I have 10 million rows, to only send the first say 10k rows to PHP... let me look through them by way of @statement->fetch(); and when the cursor gets to the end of the 10k, truncate the array and fetch the next 10k from the DB?

like image 672
Sarel Avatar asked Sep 04 '14 08:09

Sarel


2 Answers

I just ran into the same problem and wanted to share a possible solution. Chances are your DBAL uses PDO library and its PDO::MYSQL_ATTR_USE_BUFFERED_QUERY set to true which means all the results in your query are cached on mysql side and buffered into memory by PDO even though you never call $statement->fetchAll(). To fix this, we just need to set PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false but DBAL does not give us a way to do it - its PDO connection class is protected without a public method to retrieve it and it does not give us a way to use setAttribute on the PDO connection.

So, in such situations, I just use my own PDO connection to save memory and speed things up. You can easily instantiate one with your doctrine db parameters like this:

$dbal_conn = $this->getDoctrine()->getManager()->getConnection();
$params = $dbal_conn->getParams();
$pdo_conn = new \PDO(
  'mysql:dbname='.$dbal_conn->getDatabase().';unix_socket='.$params['unix_socket'],
  $dbal_conn->getUsername(),
  $dbal_conn->getPassword()
);
$pdo_conn->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

I am using unix sockets but IP host addresses can also be easily used.

like image 84
kroky Avatar answered Oct 22 '22 11:10

kroky


The selected answer is wrong and @kroky's answer should be selected as the correct one.

The problem is Buffer vs Unbuffered Queries.

Now it won't be a good idea to change the behaviour for all queries, because:

Unless the full result set was fetched from the server no further queries can be sent over the same connection.

Hence, it should only be used when necessary. Here is a full working example with >200k objects:

    $qb = ...->createQueryBuilder('p');

    $this
        ->em
        ->getConnection()
        ->getWrappedConnection()
        ->setAttribute(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

    $query = $qb->getQuery();
    $result = $query->iterate();
    $batchSize = 20;
    $i = 0;
    foreach ($result as $product)
    {
        $i++;

        var_dump($product[0]->getSku());

        if (($i % $batchSize) === 0) {
            $this->em->flush();
            $this->em->clear(); // Detaches all objects from Doctrine!
        }
    }

It most likely needs some refinement.

like image 11
Enrico Stahn Avatar answered Oct 22 '22 09:10

Enrico Stahn