Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

postgres query with IN is very slow

I have a table which has an index on (column A, column B). And I'm running a query that looks like this:

SELECT * FROM table WHERE (A, B) IN ((a_1, b_1), (a_2, b_2), ..., (a_5000, b_5000))

This query is very slow ! The plan looks like:

Bitmap Heap Scan on table
  Recheck Cond: (((A = a_1) AND (B = b_1)) OR ((A = a_2) AND (B = b_2)) OR ...
  ->  BitmapOr
        ->  Bitmap Index Scan on idx
              Index Cond: ((A = a_1) AND (B = b_1))
        ->  Bitmap Index Scan on idx
              Index Cond: ((A = a_2) AND (B = b_2))
        ...(5000 other Bitmax Index Scan)

Instead of doing one index scan with 5000 values, postgres seems to be doing 5000 index scan with one value at a time, which explains why the query is so slow.

Actually it is way faster to do someting like:

SELECT * FROM table WHERE A IN (a_1, ..., a_5000)

fetch the results and then filter on column B inside the app (python).

I'd really prefer to have the results already filtered by postgres with a reasonable running time. Is there a workaround ?

like image 993
MG1992 Avatar asked Aug 31 '16 10:08

MG1992


People also ask

How to attack slow queries in PostgreSQL?

A more traditional way to attack slow queries is to make use of PostgreSQL’s slow query log. The idea is: If a query takes longer than a certain amount of time, a line will be sent to the log. This way slow queries can easily be spotted so that developers and administrators can quickly react and know where to look.

Is your query log too heavy for PostgreSQL infrastructure?

Note: It’s often too heavy for PostgreSQL infrastructure if you change slow query log settings in postgresql.conf , Therefore it makes more sensible to change only for a selected database or user:

How to handle bad performance in PostgreSQL?

Here are my top three suggestions to handle bad performance: Each method has its own advantages and disadvantages, which will be discussed in this document A more traditional way to attack slow queries is to make use of PostgreSQL’s slow query log. The idea is: If a query takes longer than a certain amount of time, a line will be sent to the log.

How can I monitor query performance in PostgreSQL?

PostgreSQL Server is highly configurable to collect details on query performance: slow query log, auditing execution plans with auto_explain and querying pg_stat_statements .


1 Answers

Try joining to a CTE:

with value_list (a,b) as (
  values 
      (a_1, b_1), 
      (a_2, b_2), ..., 
      (a_5000, b_5000) 
)
select *
from table t
  join value_list v on (t.a, t.b) = (v.a, v.b);

(This assumes you have no duplicates in the list of values)

like image 187
a_horse_with_no_name Avatar answered Sep 26 '22 04:09

a_horse_with_no_name