alternative to SQL count subquery

Question

I have the following query:

SELECT DISTINCT 
    e.id, 
    folder, 
    subject, 
    in_reply_to, 
    message_id, 
    "references", 
    e.updated_at,
    (
        select count(*)  
        from emails  
        where 
        (
            select "references"[1] 
            from emails 
            where message_id = e.message_id
        ) = ANY ("references") 
        or message_id = 
        (
            select "references"[1] 
            from emails 
            where message_id = e.message_id
        )
    )
FROM "emails" e
INNER JOIN "email_participants" 
    ON ("email_participants"."email_id" = e."id") 
WHERE (("user_id" = 220) 
AND ("folder" = 'INBOX')) 
ORDER BY e."updated_at" DESC 
LIMIT 10 OFFSET 0;

Here is the explain analyze output of the above query.

The query peformed fine until I added the count subquery below:

(
    select count(*)  
    from emails  
    where 
    (
        select "references"[1] 
        from emails 
        where message_id = e.message_id
    ) = ANY ("references") 
    or message_id = 
    (
        select "references"[1] 
        from emails 
        where message_id = e.message_id
    )
)

In fact I have tried simpler subqueries and it seems to be the aggregate function itself that is taking the time.

Is then an alternative way that I could append the count subquery onto each result? Should I update the results after the initial query has run for example?

Here is a pastebin that will create the table and also run the badly performing query at the end to display what the output should be.

FuzzyTree · Accepted Answer

Expanding on Paul Guyot's answer you could move the subquery into a derived table, which should perform faster because it fetches the message counts in one scan (plus a join) as opposed to 1 scan per row.

SELECT DISTINCT 
    e.id, 
    e.folder, 
    e.subject, 
    in_reply_to, 
    e.message_id, 
    e."references", 
    e.updated_at,
    t1.message_count
FROM "emails" e
INNER JOIN "email_participants" 
    ON ("email_participants"."email_id" = e."id") 
INNER JOIN (
    SELECT COUNT(e2.id) message_count, e.message_id
    FROM emails e
    LEFT JOIN emails e2 ON (ARRAY[e."references"[1]] <@ e2."references"
    OR e2.message_id = e."references"[1])
    GROUP BY e.message_id
) t1 ON t1.message_id = e.message_id
WHERE (("user_id" = 220) 
AND ("folder" = 'INBOX')) 
ORDER BY e."updated_at" DESC 
LIMIT 10 OFFSET 0;

Fiddle using pastebin data - http://www.sqlfiddle.com/#!15/c6298/7

Below are the query plans postgres produces for getting count in a correlated subquery vs getting count by joining a derived table. I used one of my own tables but I think the results should be similar.

Correlated Subquery

"Limit  (cost=0.00..1123641.81 rows=1000 width=8) (actual time=11.237..5395.237 rows=1000 loops=1)"
"  ->  Seq Scan on visit v  (cost=0.00..44996236.24 rows=40045 width=8) (actual time=11.236..5395.014 rows=1000 loops=1)"
"        SubPlan 1"
"          ->  Aggregate  (cost=1123.61..1123.62 rows=1 width=0) (actual time=5.393..5.393 rows=1 loops=1000)"
"                ->  Seq Scan on visit v2  (cost=0.00..1073.56 rows=20018 width=0) (actual time=0.002..4.280 rows=21393 loops=1000)"
"                      Filter: (company_id = v.company_id)"
"                      Rows Removed by Filter: 18653"
"Total runtime: 5395.369 ms"

Joining a Derived Table

"Limit  (cost=1173.74..1211.81 rows=1000 width=12) (actual time=21.819..22.629 rows=1000 loops=1)"
"  ->  Hash Join  (cost=1173.74..2697.72 rows=40036 width=12) (actual time=21.817..22.465 rows=1000 loops=1)"
"        Hash Cond: (v.company_id = visit.company_id)"
"        ->  Seq Scan on visit v  (cost=0.00..973.45 rows=40045 width=8) (actual time=0.010..0.198 rows=1000 loops=1)"
"        ->  Hash  (cost=1173.71..1173.71 rows=2 width=12) (actual time=21.787..21.787 rows=2 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"              ->  HashAggregate  (cost=1173.67..1173.69 rows=2 width=4) (actual time=21.783..21.784 rows=3 loops=1)"
"                    ->  Seq Scan on visit  (cost=0.00..973.45 rows=40045 width=4) (actual time=0.003..6.695 rows=40046 loops=1)"
"Total runtime: 22.806 ms"

alternative to SQL count subquery

Tags:

sql

postgresql

dagda1

1 Answers

FuzzyTree

Recent Activity

Donate For Us

alternative to SQL count subquery

Tags:

sql

postgresql

dagda1

1 Answers

FuzzyTree

Related questions

Recent Activity

Donate For Us