I have a table which has an index on (column A, column B). And I'm running a query that looks like this:
SELECT * FROM table WHERE (A, B) IN ((a_1, b_1), (a_2, b_2), ..., (a_5000, b_5000))
This query is very slow ! The plan looks like:
Bitmap Heap Scan on table
Recheck Cond: (((A = a_1) AND (B = b_1)) OR ((A = a_2) AND (B = b_2)) OR ...
-> BitmapOr
-> Bitmap Index Scan on idx
Index Cond: ((A = a_1) AND (B = b_1))
-> Bitmap Index Scan on idx
Index Cond: ((A = a_2) AND (B = b_2))
...(5000 other Bitmax Index Scan)
Instead of doing one index scan with 5000 values, postgres seems to be doing 5000 index scan with one value at a time, which explains why the query is so slow.
Actually it is way faster to do someting like:
SELECT * FROM table WHERE A IN (a_1, ..., a_5000)
fetch the results and then filter on column B inside the app (python).
I'd really prefer to have the results already filtered by postgres with a reasonable running time. Is there a workaround ?
A more traditional way to attack slow queries is to make use of PostgreSQL’s slow query log. The idea is: If a query takes longer than a certain amount of time, a line will be sent to the log. This way slow queries can easily be spotted so that developers and administrators can quickly react and know where to look.
Note: It’s often too heavy for PostgreSQL infrastructure if you change slow query log settings in postgresql.conf , Therefore it makes more sensible to change only for a selected database or user:
Here are my top three suggestions to handle bad performance: Each method has its own advantages and disadvantages, which will be discussed in this document A more traditional way to attack slow queries is to make use of PostgreSQL’s slow query log. The idea is: If a query takes longer than a certain amount of time, a line will be sent to the log.
PostgreSQL Server is highly configurable to collect details on query performance: slow query log, auditing execution plans with auto_explain and querying pg_stat_statements .
Try joining to a CTE:
with value_list (a,b) as (
values
(a_1, b_1),
(a_2, b_2), ...,
(a_5000, b_5000)
)
select *
from table t
join value_list v on (t.a, t.b) = (v.a, v.b);
(This assumes you have no duplicates in the list of values)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With