I wrote query
select * from table
where exists (select 1 from table1 where table.column = table1.column)
If i will change it to
select * from table
where exists (select 1 from table1 where table.column = table1.column limit 1)
Does it change the logic?
I ask because cost of the plan query was changed (17000 -> 2400). I using Postgres 9.4
update: explain analyse verbose of two query
explain (analyze, verbose)
select * from sr_srv_rendered r
where exists (select 1 from sr_res_group rg where rg.id = r.res_group_id and rg.responsible_id = 1)
limit 30
"Limit (cost=62.06..74.63 rows=30 width=157) (actual time=0.017..0.017 rows=0 loops=1)"
" Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_id, r.price_ (...)"
" -> Nested Loop (cost=62.06..287707.96 rows=686607 width=157) (actual time=0.017..0.017 rows=0 loops=1)"
" Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_id, r. (...)"
" -> Bitmap Heap Scan on public.sr_res_group rg (cost=61.62..10093.63 rows=2734 width=4) (actual time=0.017..0.017 rows=0 loops=1)"
" Output: rg.id, rg.bdate, rg.edate, rg.is_system, rg.name, rg.department_id, rg.org_id, rg.responsible_id, rg.is_available_in_electronic_queue, rg.label_id, rg.ignore_regclinic_check, rg.note, rg.blocked, rg.block_comment, rg.template_res_grou (...)"
" Recheck Cond: (rg.responsible_id = 1)"
" -> Bitmap Index Scan on responsible_fk (cost=0.00..60.94 rows=2734 width=0) (actual time=0.015..0.015 rows=0 loops=1)"
" Index Cond: (rg.responsible_id = 1)"
" -> Index Scan using fkb95967dd9f6b119a on public.sr_srv_rendered r (cost=0.43..99.03 rows=251 width=157) (never executed)"
" Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_ (...)"
" Index Cond: (r.res_group_id = rg.id)"
"Planning time: 0.931 ms"
"Execution time: 0.355 ms"
explain (analyze, verbose)
select * from sr_srv_rendered r
where exists (select 1 from sr_res_group rg where rg.id = r.res_group_id and rg.responsible_id = 1 limit 1)
limit 30
"Limit (cost=0.00..509.03 rows=30 width=157) (actual time=49392.352..49392.352 rows=0 loops=1)"
" Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_id, r.price_ (...)"
" -> Seq Scan on public.sr_srv_rendered r (cost=0.00..100177996.03 rows=5904050 width=157) (actual time=49392.340..49392.340 rows=0 loops=1)"
" Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_id, r. (...)"
" Filter: (SubPlan 1)"
" Rows Removed by Filter: 11062881"
" SubPlan 1"
" -> Limit (cost=0.43..8.46 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=11062881)"
" Output: (1)"
" -> Index Scan using sr_res_group_pk on public.sr_res_group rg (cost=0.43..8.46 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=11062881)"
" Output: 1"
" Index Cond: (rg.id = r.res_group_id)"
" Filter: (rg.responsible_id = 1)"
" Rows Removed by Filter: 1"
"Planning time: 0.694 ms"
"Execution time: 49392.495 ms"
explain without params built faster
Based on those results, the high-level logic is not changed (the same, empty, set is returned in both cases) but the plan is changed resulting in a large performance difference.
What appears to be happening is that PostgreSQL understands and is happy to turn the first case (without the LIMIT inside the EXISTS) into a Nested Loop join, while in the second case (with the LIMIT inside the EXISTS), PostgreSQL doesn't know how to turn it into a join (due to the LIMIT) and implements it using the naive approach- doing a sequential scan across the table and running the sub-query for each and every row.
PostgreSQL understands how EXISTS works and it knows that it need only find a single row, adding the "LIMIT 1" isn't necessary and, as in this case, actually ends up being detrimental.
It's possible that PostgreSQL could be improved to realize that a LIMIT 1 inside of an EXISTS is just noise and should not be meaningful, but that would increase the amount of time required to plan queries and it's not immediately clear that such time would be well spent.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With