Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL query is slow when using NOT IN

Tags:

sql

postgresql

I have a PostgreSQL function that returns a query result to pgadmin results grid REALLY FAST. Internally, this is a simple function that uses a dblink to connect to another database and does a query return so that I can simply run

SELECT * FROM get_customer_trans();

And it runs just like a basic table query.

The issue is when I use the NOT IN clause. So I want to run the following query, but it takes forever:

SELECT * FROM get_customer_trans()
WHERE user_email NOT IN 
    (SELECT do_not_email_address FROM do_not_email_tbl);

How can I speed this up? Anything faster than a NOT IN clause for this scenario?

like image 251
Horse Voice Avatar asked Jun 08 '13 04:06

Horse Voice


People also ask

How make PostgreSQL query run faster?

Some of the tricks we used to speed up SELECT-s in PostgreSQL: LEFT JOIN with redundant conditions, VALUES, extended statistics, primary key type conversion, CLUSTER, pg_hint_plan + bonus.

Why is PostgreSQL so slow?

PostgreSQL attempts to do a lot of its work in memory, and spread out writing to disk to minimize bottlenecks, but on an overloaded system with heavy writing, it's easily possible to see heavy reads and writes cause the whole system to slow as it catches up on the demands.

How do I find slow queries in PostgreSQL?

Typically discovered through slow response or extended increases in database CPU, the pg_stat_activity view can help to find out what query is causing issues. The pg_stat_activity view contains details of all currently running queries, including user, connection, and timing details.

Does Postgres optimize queries?

How the PostgreSQL query optimizer works. Just like any advanced relational database, PostgreSQL uses a cost-based query optimizer that tries to turn your SQL queries into something efficient that executes in as little time as possible.


1 Answers

get_customer_trans() is not a table - probably some stored procedure, so query is not really trivial. You'd need to look at what this stored procedure really does to understand why it might work slow.

However, regardless of stored procedure behavior, adding following index should help a lot:

CREATE INDEX do_not_email_tbl_idx1
    ON do_not_email_tbl(do_not_email_address);

This index lets NOT IN query to quickly return answer. However, NOT IN is known to have issues in older PostgreSQL versions - so make sure that you are running at least PostgreSQL 9.1 or later.

UPDATE. Try to change your query to:

SELECT t.*
FROM get_customer_trans() AS t
WHERE NOT EXISTS (
    SELECT 1
    FROM do_not_email_tbl
    WHERE do_not_email_address = t.user_email
    LIMIT 1
)

This query does not use NOT IN, and should work fast. I think that in PostgreSQL 9.2 this query should work as fast as one with NOT IN though.

like image 55
mvp Avatar answered Nov 15 '22 07:11

mvp