Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Abysmal Performance for Count() using ZF2 Paginator on InnoDB Tables

I am attempting to use the ZF2 Paginator on some large (roughly 10 million in worst case w/o a search filter) record sets. My tables are in InnoDB format, which I understand do not keep a definite count as part of metadata.

I realize that I can extend the Zend\Paginator\Adapter\DbSelect class and implement my own count() method that uses count data that I manually store in another table, but I'm unsure on how to store counts for all the possible permutations of searches that might be done.

The default ZF2 DbSelect adapter uses this method:

<?php
public function count()
{
    if ($this->rowCount !== null) {
        return $this->rowCount;
    }

    $select = clone $this->select;
    $select->reset(Select::LIMIT);
    $select->reset(Select::OFFSET);
    $select->reset(Select::ORDER);

    $countSelect = new Select;
    $countSelect->columns(array('c' => new Expression('COUNT(1)')));
    $countSelect->from(array('original_select' => $select));

    $statement = $this->sql->prepareStatementForSqlObject($countSelect);
    $result    = $statement->execute();
    $row       = $result->current();

    $this->rowCount = $row['c'];

    return $this->rowCount;
}
?>

Here is a very simple example query that method produces for me:

SELECT
    COUNT(1) AS `c`
FROM
    (
        SELECT
            `contacts`.`id` AS `id`,
            `contacts`.`firstname` AS `firstname`,
            `contacts`.`middlename` AS `middlename`,
            `contacts`.`lastname` AS `lastname`,
            `contacts`.`gender` AS `gender`
        FROM
            `contacts`
        WHERE
            `contacts`.`trash` = '0'
    ) AS `original_select`

I'm not sure what the performance would be on MyISAM tables, but this fails for me due to it eating up all of the freeable space on the Amazon RDS (25GB, db.m1.small) instance on which it is running. As a comparison, running just the inner (original) query, it completes in 100 sec (certainly not good) and returns 7.39 million records.

Here is the EXPLAIN from the inner query (an EXPLAIN on the count one also dies because of disk space on the RDS server):

+----+-------------+----------+------+---------------+-------+---------+-------+---------+-------+
| id | select_type | table    | type | possible_keys | key   | key_len | ref   | rows    | Extra |
+----+-------------+----------+------+---------------+-------+---------+-------+---------+-------+
| 1  | SIMPLE      | contacts | ref  | trash         | trash | 1       | const | 3441317 |       |
+----+-------------+----------+------+---------------+-------+---------+-------+---------+-------+
1 rows in set (0.04 sec)

Is there anything that can be done to tune this better? Is the way that the ZF2 Paginator handles counts incompatible in some way w/ the way that InnoDB does things? How would others handle caching counts of all of the possible queries if we're allowing searches on most of the fields in the database?

Thanks in advance...

like image 738
jcq Avatar asked Jun 21 '13 18:06

jcq


2 Answers

You do not need the select from the original query - this consumes your memory/diskspace!

SELECT count( 1 ) AS `c`
FROM (
    SELECT 1
    FROM `contacts`
    WHERE `trash` = 0
) AS `original_select`

Beside that:

  • Assuming trash is just a boolean value, make it a boolean not nullable column and search for an int or boolean is true/false

    ALTER TABLE `contacts` CHANGE `trash` `trash` TINYINT( 1 ) NOT NULL 
    
  • Be sure to index the trash column

    ALTER TABLE `contacts` ADD INDEX `TRASH` ( `trash` )
    

Further more:

  • Pagination of large result sets do not necessarily require an exact count: Say we are displaying 100 entries per page, we do not need a 100000 single page N buttons. Instead calculate the page by using your offset and limit and just show single buttons for e.g. the prev/next 10 pages and combine it with some "show next/prev 10 pages" buttons.

  • When you need the possibility of "going to the last page" why not use something like a DESC order to achieve something similar.

  • Is there really a situation where someone would paginate through your 10m rows? Maybe provide advanced filters to help the user find what he needs.

like image 148
conceptdeluxe Avatar answered Nov 03 '22 16:11

conceptdeluxe


If you use this query instead:

SELECT c from
(    
  SELECT COUNT(1) AS c
  from contacts
  where trash = '0'
) AS original_select
like image 35
Hackerman Avatar answered Nov 03 '22 17:11

Hackerman