I have the following query:
SELECT DISTINCT
e.id,
folder,
subject,
in_reply_to,
message_id,
"references",
e.updated_at,
(
select count(*)
from emails
where
(
select "references"[1]
from emails
where message_id = e.message_id
) = ANY ("references")
or message_id =
(
select "references"[1]
from emails
where message_id = e.message_id
)
)
FROM "emails" e
INNER JOIN "email_participants"
ON ("email_participants"."email_id" = e."id")
WHERE (("user_id" = 220)
AND ("folder" = 'INBOX'))
ORDER BY e."updated_at" DESC
LIMIT 10 OFFSET 0;
Here is the explain analyze output of the above query.
The query peformed fine until I added the count subquery below:
(
select count(*)
from emails
where
(
select "references"[1]
from emails
where message_id = e.message_id
) = ANY ("references")
or message_id =
(
select "references"[1]
from emails
where message_id = e.message_id
)
)
In fact I have tried simpler subqueries and it seems to be the aggregate function itself that is taking the time.
Is then an alternative way that I could append the count subquery onto each result? Should I update the results after the initial query has run for example?
Here is a pastebin that will create the table and also run the badly performing query at the end to display what the output should be.
Expanding on Paul Guyot's answer you could move the subquery into a derived table, which should perform faster because it fetches the message counts in one scan (plus a join) as opposed to 1 scan per row.
SELECT DISTINCT
e.id,
e.folder,
e.subject,
in_reply_to,
e.message_id,
e."references",
e.updated_at,
t1.message_count
FROM "emails" e
INNER JOIN "email_participants"
ON ("email_participants"."email_id" = e."id")
INNER JOIN (
SELECT COUNT(e2.id) message_count, e.message_id
FROM emails e
LEFT JOIN emails e2 ON (ARRAY[e."references"[1]] <@ e2."references"
OR e2.message_id = e."references"[1])
GROUP BY e.message_id
) t1 ON t1.message_id = e.message_id
WHERE (("user_id" = 220)
AND ("folder" = 'INBOX'))
ORDER BY e."updated_at" DESC
LIMIT 10 OFFSET 0;
Fiddle using pastebin data - http://www.sqlfiddle.com/#!15/c6298/7
Below are the query plans postgres produces for getting count in a correlated subquery vs getting count by joining a derived table. I used one of my own tables but I think the results should be similar.
Correlated Subquery
"Limit (cost=0.00..1123641.81 rows=1000 width=8) (actual time=11.237..5395.237 rows=1000 loops=1)"
" -> Seq Scan on visit v (cost=0.00..44996236.24 rows=40045 width=8) (actual time=11.236..5395.014 rows=1000 loops=1)"
" SubPlan 1"
" -> Aggregate (cost=1123.61..1123.62 rows=1 width=0) (actual time=5.393..5.393 rows=1 loops=1000)"
" -> Seq Scan on visit v2 (cost=0.00..1073.56 rows=20018 width=0) (actual time=0.002..4.280 rows=21393 loops=1000)"
" Filter: (company_id = v.company_id)"
" Rows Removed by Filter: 18653"
"Total runtime: 5395.369 ms"
Joining a Derived Table
"Limit (cost=1173.74..1211.81 rows=1000 width=12) (actual time=21.819..22.629 rows=1000 loops=1)"
" -> Hash Join (cost=1173.74..2697.72 rows=40036 width=12) (actual time=21.817..22.465 rows=1000 loops=1)"
" Hash Cond: (v.company_id = visit.company_id)"
" -> Seq Scan on visit v (cost=0.00..973.45 rows=40045 width=8) (actual time=0.010..0.198 rows=1000 loops=1)"
" -> Hash (cost=1173.71..1173.71 rows=2 width=12) (actual time=21.787..21.787 rows=2 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> HashAggregate (cost=1173.67..1173.69 rows=2 width=4) (actual time=21.783..21.784 rows=3 loops=1)"
" -> Seq Scan on visit (cost=0.00..973.45 rows=40045 width=4) (actual time=0.003..6.695 rows=40046 loops=1)"
"Total runtime: 22.806 ms"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With