Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Doctrine Paginator selects entire table (very slow)?

This is related to a previous question here: Doctrine/Symfony query builder add select on left join

I want to perform a complex join query using Doctrine ORM. I want to select 10 paginated blog posts, left joining a single author, like value for current user, and hashtags on the post. My query builder looks like this:

$query = $em->createQueryBuilder()
            ->select('p')              
            ->from('Post', 'p')
            ->leftJoin('p.author', 'a')
            ->leftJoin('p.hashtags', 'h')
            ->leftJoin('p.likes', 'l', 'WITH', 'l.post_id = p.id AND l.user_id = 10')
            ->where("p.foo = bar")
            ->addSelect('a AS post_author')
            ->addSelect('l AS post_liked')
            ->addSelect('h AS post_hashtags')
            ->orderBy('p.time', 'DESC')
            ->setFirstResult(0)
            ->setMaxResults(10);

// FAILS - because left joined hashtag collection breaks LIMITS
$result = $query->getQuery()->getResult(); 

// WORKS - but is extremely slow (count($result) shows over 80,000 rows)
$result = new \Doctrine\ORM\Tools\Pagination\Paginator($query, true);

Strangely, count($result) on the paginator shows the total number of rows in my table (over 80,000) but traversing the $result with foreach outputs 10 Post entities, as expected. Do I need to do some additional configuration to properly limit my paginator?

If this is a limitation of the paginator class what other options do I have? Writing custom paginator code or other paginator libraries?

(bonus): How can I hydrate an array, like $query->getQuery()->getArrayResult();?

EDIT: I left out a stray orderBy in my function. It looks like including both groupBy and orderBy causes the slowdown (using groupBy rather than the paginator). If I omit one or the other, the query is fast. I tried adding an index on the "time" column in my table, but didn't see any improvement.

Things I Tried

// works, but makes the query about 50x slower
$query->groupBy('p.id');
$result = $query->getQuery()->getArrayResult();

// adding an index on the time column (no improvement)
indexes:
    time_idx:
        columns: [ time ]

// the above two solutions don't work because MySQL ORDER BY
// ignores indexes if GROUP BY is used on a different column
// e.g. "ORDER BY p.time GROUP BY p.id is" slow
like image 610
CaptainStiggz Avatar asked Oct 16 '25 16:10

CaptainStiggz


1 Answers

You should simplify your query. That would shave off some execution time. I can't test your query but here are a few pointers:

  • don't do sort while executing count()
  • you could sort by orderBy('p.id', 'DESC'), index would be used
  • instead of leftJoin() you could use join() if at least one record always exists at joined table. Else that record is skipped.
  • KNP/Paginator uses DISTINCT() to read only distinct records, but that could lead to using disk tmp table
  • $query->getArrayResult() uses array hidration mode, which returns multidimension array and it is way faster than object hidration for large result set
  • you could use partial select('partial p.{id, other used fields}'), this way you would load only needed fields, maybe skip unneded relations when using object hydration
  • check SF profiler EXPLAIN on a given query under doctrine section, maybe indexes are not used
  • does p.hashtags and p.likes return only one row or is oneToMany, which multiplies result
  • maybe some Posts design changes, that would remove some joins:
    • have p.hashtags field defined as @ORM\Column(type="array") and have stored string values of tags. Later maybe using full text search on serialized array.
    • have p.likesCount field defined as @ORM\Column(type="integer") which would have count of likes

I use KnpLabs/KnpPaginatorBundle and can also have speed issues for complex queries.

Usually using LIMIT x,z is slow for DB, because it runs COUNT on whole dataset. If indexes are not used it is painfully slow.

You could use different approach and do some custom pagination by ID advancing, but that would complicate your approach. I have used this with large datasets like SYSLOG tables. But you loose sorting and total record count functionality.

like image 65
Mulcek Avatar answered Oct 18 '25 07:10

Mulcek



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!