I recently asked a question and someone provided me with a solution that works but I forgot to mention there was that my table has millions of rows (~10 million for items table, ~1 millon for the others) and perhaps they thought I was working with a small dataset as in the example I provided.
Here is the SQL:
WITH a AS (
SELECT item.id, string_agg(prefered_store.store::varchar, ',') wishlist_stores
FROM item, list_wishlist, wishlist, prefered_store
WHERE item.list=list_wishlist.list
AND list_wishlist.wishlist=wishlist.id
AND wishlist.prefered_stores=prefered_store.id
GROUP BY item.id
), b AS (
SELECT item.id,
string_agg(
prefered_store.store::varchar || ',' || prefered_store.comment,
' ; ') item_stores_comments
FROM item, prefered_store
WHERE item.prefered_stores=prefered_store.id
GROUP BY item.id
)
SELECT a.id,item_stores_comments,wishlist_stores
FROM a,b
WHERE a.id=b.id
While it does exactly what I need, it's extremely slow. Takes ~10 minutes with limit of just one row and about 15 minutes for 10 rows. I'm still waiting to see how long it will take for a thousand rows (it's been almost and hour). Now my desktop is not the fastest: Pentium 4 with 1.5GB ram but it still doesn't feel right.
I've indexed all the fields in the WHERE clause and created primary keys where needed. Other than this is there any way to make this query run faster?
PostgreSQL 9.2
DDL: https://docs.google.com/file/d/0BwiyuwRCaqkCM09LVkJ4YlVNLWM/edit
Simple diagram with only relevant tables and fields:
EXPLAIN ANALYZE:
Merge Join (cost=23342752.95..12971604557.95 rows=863210883998 width=68) (actual time=1182616.544..1251542.167 rows=13139337 loops=1)
Merge Cond: (a.id = b.id)
CTE a
-> GroupAggregate (cost=8477658.65..8992463.86 rows=13139337 width=8) (actual time=252170.500..307061.559 rows=13139337 loops=1)
-> Sort (cost=8477658.65..8547771.35 rows=28045080 width=8) (actual time=252170.391..282495.516 rows=14870222 loops=1)
Sort Key: public.item.id
Sort Method: external merge Disk: 261528kB
-> Merge Join (cost=3010452.34..3474579.76 rows=28045080 width=8) (actual time=138444.102..210768.838 rows=14870222 loops=1)
Merge Cond: (list_wishlist.list = public.item.list)
-> Sort (cost=689954.53..695268.01 rows=2125390 width=8) (actual time=30482.447..55193.049 rows=1286901 loops=1)
Sort Key: list_wishlist.list
Sort Method: external merge Disk: 22624kB
-> Hash Join (cost=66643.55..408462.52 rows=2125390 width=8) (actual time=10417.244..26147.517 rows=1286901 loops=1)
Hash Cond: (wishlist.prefered_stores = public.prefered_store.id)
-> Hash Join (cost=38565.70..96225.43 rows=1226863 width=8) (actual time=8188.097..19815.024 rows=1226863 loops=1)
Hash Cond: (list_wishlist.wishlist = wishlist.id)
-> Seq Scan on list_wishlist (cost=0.00..22266.63 rows=1226863 width=8) (actual time=12.786..7467.442 rows=1226863 loops=1)
-> Hash (cost=20352.20..20352.20 rows=1110120 width=8) (actual time=7314.531..7314.531 rows=1110087 loops=1)
Buckets: 4096 Batches: 64 Memory Usage: 689kB
-> Seq Scan on wishlist (cost=0.00..20352.20 rows=1110120 width=8) (actual time=7.621..6572.731 rows=1110087 loops=1)
-> Hash (cost=14027.49..14027.49 rows=856349 width=8) (actual time=2159.339..2159.339 rows=856349 loops=1)
Buckets: 4096 Batches: 64 Memory Usage: 536kB
-> Seq Scan on prefered_store (cost=0.00..14027.49 rows=856349 width=8) (actual time=0.071..1602.971 rows=856349 loops=1)
-> Materialize (cost=2320484.45..2386181.13 rows=13139337 width=8) (actual time=107961.603..149020.809 rows=14870219 loops=1)
-> Sort (cost=2320484.45..2353332.79 rows=13139337 width=8) (actual time=107961.575..145971.848 rows=13139337 loops=1)
Sort Key: public.item.list
Sort Method: external merge Disk: 231088kB
-> Seq Scan on item (cost=0.00..228006.37 rows=13139337 width=8) (actual time=27.636..47661.750 rows=13139337 loops=1)
CTE b
-> GroupAggregate (cost=7166704.38..7843349.46 rows=13139337 width=12) (actual time=524258.000..794537.585 rows=13139337 loops=1)
-> Sort (cost=7166704.38..7223638.09 rows=22773483 width=12) (actual time=524257.908..755765.703 rows=13858612 loops=1)
Sort Key: public.item.id
Sort Method: external merge Disk: 297912kB
-> Merge Join (cost=2448353.26..2826901.79 rows=22773483 width=12) (actual time=201205.036..425873.108 rows=13858612 loops=1)
Merge Cond: (public.prefered_store.id = public.item.prefered_stores)
-> Sort (cost=127685.43..129826.31 rows=856349 width=12) (actual time=4545.447..12507.054 rows=856346 loops=1)
Sort Key: public.prefered_store.id
Sort Method: external merge Disk: 18408kB
-> Seq Scan on prefered_store (cost=0.00..14027.49 rows=856349 width=12) (actual time=0.060..2707.353 rows=856349 loops=1)
-> Materialize (cost=2320484.45..2386181.13 rows=13139337 width=8) (actual time=196659.554..406944.706 rows=13858611 loops=1)
-> Sort (cost=2320484.45..2353332.79 rows=13139337 width=8) (actual time=196659.535..396917.629 rows=13139337 loops=1)
Sort Key: public.item.prefered_stores
Sort Method: external merge Disk: 231096kB
-> Seq Scan on item (cost=0.00..228006.37 rows=13139337 width=8) (actual time=0.032..54885.583 rows=13139337 loops=1)
-> Sort (cost=3253469.82..3286318.16 rows=13139337 width=36) (actual time=344329.838..353118.692 rows=13139337 loops=1)
Sort Key: a.id
Sort Method: external sort Disk: 259792kB
-> CTE Scan on a (cost=0.00..262786.74 rows=13139337 width=36) (actual time=252170.512..320132.738 rows=13139337 loops=1)
-> Materialize (cost=3253469.82..3319166.50 rows=13139337 width=36) (actual time=838286.670..888495.578 rows=13139337 loops=1)
-> Sort (cost=3253469.82..3286318.16 rows=13139337 width=36) (actual time=838286.652..886198.912 rows=13139337 loops=1)
Sort Key: b.id
Sort Method: external sort Disk: 385320kB
-> CTE Scan on b (cost=0.00..262786.74 rows=13139337 width=36) (actual time=524258.017..811717.462 rows=13139337 loops=1)
Total runtime: 1253101.865 ms
See, in PostgreSQL CTEs are acting as an optimization fence. As a result no matter which predicates you will add to the outer part of the query, PostgreSQL will perform a full traverse of the WITH()
clauses. So to optimize you need to get rid of CTEs.
This is not easy as you prefered_store
table is denormalized.
Please, try this query (also on SQL Fiddle):
SELECT i.id item,
(SELECT string_agg(store||','||comment, ';')
FROM prefered_store WHERE id=i.prefered_stores) item_stores_comments,
string_agg(ps.store::text, ',') whishlist_stores
FROM item i
JOIN list_wishlist lw ON lw.list=i.list
JOIN wishlist w ON w.id=lw.wishlist
JOIN prefered_store ps ON ps.id=w.prefered_stores
GROUP BY i.id;
But I would recommend reviewing schema design.
Building columns of comma-separated values in SQL is almost always the wrong approach. Better to return rows of data, and let application code deal with display formatting.
Test by eliminating the string_agg() function and the GROUP BY clause.
with a as (
select item.id,
prefered_store.store wishlist_stores
from item
inner join list_wishlist on item.list=list_wishlist.list
inner join wishlist on list_wishlist.wishlist=wishlist.id
inner join prefered_store on wishlist.prefered_stores=prefered_store.id
), b as (
select item.id,
prefered_store.store,
prefered_store.comment item_stores_comments
from item
inner join prefered_store on item.prefered_stores=prefered_store.id
)
select * from a
inner join b on a.id = b.id
The SQL Fiddle you posted isn't much use. It has no primary keys, no secondary indexes, and not enough rows to avoid sequential scans.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With