Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres NOT IN performance

Any ideas how to speed up this query?

Input

EXPLAIN SELECT entityid FROM entity e

LEFT JOIN level1entity l1 ON l1.level1id = e.level1_level1id
LEFT JOIN level2entity l2 ON l2.level2id = l1.level2_level2id
WHERE 

l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f' 
AND 
(entityid NOT IN 
(1377776,1377792,1377793,1377794,1377795,1377796... 50000 ids)
)

Output

Nested Loop  (cost=0.00..1452373.79 rows=3865 width=8)
  ->  Nested Loop  (cost=0.00..8.58 rows=1 width=8)
        Join Filter: (l1.level2_level2id = l2.level2id)
        ->  Seq Scan on level2entity l2  (cost=0.00..3.17 rows=1 width=8)
              Filter: ((userid)::text = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f'::text)
        ->  Seq Scan on level1entity l1  (cost=0.00..4.07 rows=107 width=16)
  ->  Index Scan using fk_fk18edb1cfb2a41235_idx on entity e  (cost=0.00..1452086.09 rows=22329 width=16)
        Index Cond: (level1_level1id = l1.level1id)

OK here a simplified version, the joins aren't the bottleneck

SELECT enitityid FROM 
(SELECT enitityid FROM enitity e LIMIT 5000) a

WHERE
(enitityid NOT IN 
(1377776,1377792,1377793,1377794,1377795, ... 50000 ids)
)

the problem is to find the enties which don't have any of these ids

EXPLAIN

Subquery Scan on a  (cost=0.00..312667.76 rows=1 width=8)
  Filter: (e.entityid <> ALL ('{1377776,1377792,1377793,1377794, ... 50000 ids}'::bigint[]))
  ->  Limit  (cost=0.00..111.51 rows=5000 width=8)
        ->  Seq Scan on entity e  (cost=0.00..29015.26 rows=1301026 width=8)
like image 854
wutzebaer Avatar asked Jul 23 '13 14:07

wutzebaer


People also ask

How do I resolve a performance issue in PostgreSQL?

Using EXPLAIN. One of the most important tools for debugging performance issues is the EXPLAIN command. It's a great way to understand what Postgres is doing behind the scenes. The result would be the execution plan for the query.

Why is PostgreSQL so slow?

PostgreSQL attempts to do a lot of its work in memory, and spread out writing to disk to minimize bottlenecks, but on an overloaded system with heavy writing, it's easily possible to see heavy reads and writes cause the whole system to slow as it catches up on the demands.

Why is Postgres not using my index?

As such, if Postgres chooses not to use your index, it is likely that it is calculating the cost of that query plan to be higher than the one it has chosen.

Where Not Exists in Postgres?

PostgreSQL EXISTS examples The NOT EXISTS is opposite to EXISTS . It means that if the subquery returns no row, the NOT EXISTS returns true. If the subquery returns one or more rows, the NOT EXISTS returns false.


2 Answers

A huge IN list is very inefficient. PostgreSQL should ideally identify it and turn it into a relation that it does an anti-join on, but at this point the query planner doesn't know how to do that, and the planning time required to identify this case would cost every query that uses NOT IN sensibly, so it'd have to be a very low cost check. See this earlier much more detailed answer on the topic.

As David Aldridge wrote this is best solved by turning it into an anti-join. I'd write it as a join over a VALUES list simply because PostgreSQL is extremely fast at parsing VALUES lists into relations, but the effect is the same:

SELECT entityid 
FROM entity e
LEFT JOIN level1entity l1 ON l.level1id = e.level1_level1id
LEFT JOIN level2entity l2 ON l2.level2id = l1.level2_level2id
LEFT OUTER JOIN (
    VALUES
    (1377776),(1377792),(1377793),(1377794),(1377795),(1377796)
) ex(ex_entityid) ON (entityid = ex_entityid)
WHERE l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f' 
AND ex_entityid IS NULL; 

For a sufficiently large set of values you might even be better off creating a temporary table, COPYing the values into it, creating a PRIMARY KEY on it, and joining on that.

More possibilities explored here:

https://stackoverflow.com/a/17038097/398670

like image 64
Craig Ringer Avatar answered Oct 28 '22 23:10

Craig Ringer


You might get a better result if you can rewrite the query to use a hash anti-join.

Something like:

with exclude_list as (
  select unnest(string_to_array('1377776,1377792,1377793,1377794,1377795, ...',','))::integer entity_id)
select entity_id
from   entity left join exclude_list on entity.entity_id = exclude_list.entity_id
where  exclude_list.entity_id is null;
like image 30
David Aldridge Avatar answered Oct 28 '22 23:10

David Aldridge