Here is an simple example for my problem: <pre class="prettyprint"><code>CREATE TABLE test1 (id SERIAL, key TEXT UNIQUE, value TEXT); CREATE TABLE test2 (id SERIAL, key TEXT UNIQUE, value TEXT); INSERT INTO test1 (key, value) SELECT i::TEXT, 'ABC' || i::TEXT FROM generate_series(0, 1000000) AS i; INSERT INTO test2 (key, value) SELECT i::TEXT, 'ABC' || (i+1000)::TEXT FROM generate_series(0, 600000) AS i; INSERT INTO test2 (key, value) SELECT i::TEXT, 'ABC' || (i+1000)::TEXT FROM generate_series(1000000, 1200000) AS i; CREATE INDEX test1_key ON test1 (key); CREATE INDEX test1_value ON test1 (value); CREATE INDEX test2_key ON test2 (key); CREATE INDEX test2_value ON test2 (value); VACUUM FULL VERBOSE ANALYZE test1; VACUUM FULL VERBOSE ANALYZE test2; </code></pre> That is the query i'm currently using, but which takes more than 6 seconds. <pre class="prettyprint"><code>EXPLAIN ANALYZE SELECT test1.key AS key1, test1.value AS value1, test2.key AS key2, test2.value AS value2 FROM test1 LEFT OUTER JOIN test2 ON (test1.key = test2.key) WHERE test1.value = 'ABC1234' OR test2.value = 'ABC1234'; key1 | value1 | key2 | value2 ------+---------+------+--------- 234 | ABC234 | 234 | ABC1234 1234 | ABC1234 | 1234 | ABC2234 (2 rows) QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Hash Left Join (cost=27344.05..79728.10 rows=2 width=32) (actual time=5428.635..6097.098 rows=2 loops=1) Hash Cond: (test1.key = test2.key) Filter: ((test1.value = 'ABC1234'::text) OR (test2.value = 'ABC1234'::text)) -> Seq Scan on test1 (cost=0.00..16321.01 rows=1000001 width=15) (actual time=0.009..1057.315 rows=1000001 loops=1) -> Hash (cost=13047.02..13047.02 rows=800002 width=17) (actual time=2231.964..2231.964 rows=800002 loops=1) Buckets: 65536 Batches: 2 Memory Usage: 14551kB -> Seq Scan on test2 (cost=0.00..13047.02 rows=800002 width=17) (actual time=0.010..980.232 rows=800002 loops=1) Total runtime: 6109.042 ms (8 rows) </code></pre> In both tables only very few datasets will meet the requirements, but it seems that fact isn't observed. I can instead use a query like that: <pre class="prettyprint"><code>EXPLAIN ANALYZE SELECT coalesce(test1.key, test3.key1) AS key1, coalesce(test1.value, test3.value1) AS value1, coalesce(test2.key, test3.key2) AS key2, coalesce(test2.value, test3.value2) AS value2 FROM (SELECT test1.key AS key1, test1.value AS value1, test2.key AS key2, test2.value AS value2 FROM (SELECT key, value FROM test1 WHERE value = 'ABC1234') AS test1 FULL JOIN (SELECT key, value FROM test2 WHERE value = 'ABC1234') AS test2 ON (test1.key = test2.key)) AS test3 LEFT OUTER JOIN test1 ON (test1.key = test3.key2) LEFT OUTER JOIN test2 ON (test2.key = test3.key1) WHERE test1.key IS NOT NULL; key1 | value1 | key2 | value2 ------+---------+------+--------- 1234 | ABC1234 | 1234 | ABC2234 234 | ABC234 | 234 | ABC1234 (2 rows) QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop Left Join (cost=0.00..33.56 rows=1 width=64) (actual time=0.075..0.083 rows=1 loops=1) -> Nested Loop (cost=0.00..25.19 rows=1 width=47) (actual time=0.066..0.072 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..16.80 rows=1 width=32) (actual time=0.051..0.054 rows=1 loops=1) -> Index Scan using test2_value_key on test2 (cost=0.00..8.41 rows=1 width=17) (actual time=0.026..0.027 rows=1 loops=1) Index Cond: (value = 'ABC1234'::text) -> Index Scan using test1_key on test1 (cost=0.00..8.38 rows=1 width=15) (actual time=0.020..0.020 rows=0 loops=1) Index Cond: (public.test1.key = public.test2.key) Filter: (public.test1.value = 'ABC1234'::text) -> Index Scan using test1_key on test1 (cost=0.00..8.38 rows=1 width=15) (actual time=0.011..0.013 rows=1 loops=1) Index Cond: ((public.test1.key IS NOT NULL) AND (public.test1.key = public.test2.key)) -> Index Scan using test2_key on test2 (cost=0.00..8.36 rows=1 width=17) (actual time=0.001..0.001 rows=0 loops=1) Index Cond: (public.test2.key = public.test1.key) Total runtime: 0.139 ms </code></pre> The following query is simpler, but still too slow: <pre class="prettyprint"><code>EXPLAIN ANALYZE SELECT test1.key AS key1, test1.value AS value1, test2.key AS key2, test2.value AS value2 FROM test1 LEFT OUTER JOIN test2 ON (test1.key = test2.key) WHERE test1.value = 'ABC1234' OR EXISTS (SELECT 1 FROM test2 t WHERE t.key = test1.key AND t.value = 'ABC1234'); key1 | value1 | key2 | value2 ------+---------+------+--------- 1234 | ABC1234 | 1234 | ABC2234 234 | ABC234 | 234 | ABC1234 (2 rows) QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Merge Left Join (cost=0.00..8446826.32 rows=500001 width=32) (actual time=615.706..1651.370 rows=2 loops=1) Merge Cond: (test1.key = test2.key) -> Index Scan using test1_key on test1 (cost=0.00..8398983.25 rows=500001 width=15) (actual time=28.449..734.567 rows=2 loops=1) Filter: ((value = 'ABC1234'::text) OR (alternatives: SubPlan 1 or hashed SubPlan 2)) SubPlan 1 -> Index Scan using test2_key on test2 t (cost=0.00..8.36 rows=1 width=0) (never executed) Index Cond: (key = $0) Filter: (value = 'ABC1234'::text) SubPlan 2 -> Index Scan using test2_value on test2 t (cost=0.00..8.37 rows=1 width=7) (actual time=0.376..0.380 rows=1 loops=1) Index Cond: (value = 'ABC1234'::text) -> Index Scan using test2_key on test2 (cost=0.00..39593.05 rows=800002 width=17) (actual time=0.019..498.456 rows=348894 loops=1) Total runtime: 1651.453 ms (13 rows) </code></pre> So my question is: Is there a simple query which will lead to a similar fast execution plan like the second query or maybe an index or some kind of hint for the planner. (I know for that example it would reasonable to only have one table with both values in it. But in reality the tables are more complicated and the table scheme can't be changed that easily.) <pre class="prettyprint"><code>PostgreSQL Version: 9.0.3 shared_buffers = 64MB effective_cache_size = 32MB work_mem = 16MB maintenance_work_mem = 32MB temp_buffers = 8MB wal_buffers= 1MB </code></pre> EDIT: As suggested from Kipotlov here is the UNION version. Why does the normal OR query not choose such a good plan? <pre class="prettyprint"><code>EXPLAIN ANALYZE SELECT test1.key AS key1, test1.value AS value1, test2.key AS key2, test2.value AS value2 FROM test1 LEFT OUTER JOIN test2 ON (test1.key = test2.key) WHERE test1.value = 'ABC1234' UNION SELECT test1.key AS key1, test1.value AS value1, test2.key AS key2, test2.value AS value2 FROM test1 LEFT OUTER JOIN test2 ON (test1.key = test2.key) WHERE test2.value = 'ABC1234'; key1 | value1 | key2 | value2 ------+---------+------+--------- 1234 | ABC1234 | 1234 | ABC2234 234 | ABC234 | 234 | ABC1234 (2 rows) QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Unique (cost=33.64..33.66 rows=2 width=32) (actual time=0.114..0.119 rows=2 loops=1) -> Sort (cost=33.64..33.64 rows=2 width=32) (actual time=0.111..0.113 rows=2 loops=1) Sort Key: public.test1.key, public.test1.value, public.test2.key, public.test2.value Sort Method: quicksort Memory: 17kB -> Append (cost=0.00..33.63 rows=2 width=32) (actual time=0.046..0.097 rows=2 loops=1) -> Nested Loop Left Join (cost=0.00..16.81 rows=1 width=32) (actual time=0.044..0.050 rows=1 loops=1) -> Index Scan using test1_value_key on test1 (cost=0.00..8.44 rows=1 width=15) (actual time=0.023..0.024 rows=1 loops=1) Index Cond: (value = 'ABC1234'::text) -> Index Scan using test2_key on test2 (cost=0.00..8.36 rows=1 width=17) (actual time=0.014..0.016 rows=1 loops=1) Index Cond: (public.test1.key = public.test2.key) -> Nested Loop (cost=0.00..16.80 rows=1 width=32) (actual time=0.036..0.041 rows=1 loops=1) -> Index Scan using test2_value_key on test2 (cost=0.00..8.41 rows=1 width=17) (actual time=0.019..0.020 rows=1 loops=1) Index Cond: (value = 'ABC1234'::text) -> Index Scan using test1_key on test1 (cost=0.00..8.38 rows=1 width=15) (actual time=0.013..0.015 rows=1 loops=1) Index Cond: (public.test1.key = public.test2.key) Total runtime: 0.173 ms (16 rows) </code></pre>

First of all, thanks for the very detailed question. It's rare to find people who have researched their problem into such detail before asking. I've been thinking about this and the problem seems to be that PostgreSQL wants to join all rows, because each non-matching row from test1 might potentially be matched in test2 -- and vice versa. The solution is forcing the planner to execute the query in two steps. One approach is the big UNION query you already tried -- to force it to consider each expression in a separate query. Another approach is forcing the planner to find matching keys first, then perform the join, so there can be no ambiguity: <pre class="prettyprint"><code>EXPLAIN ANALYZE SELECT test1.key AS key1, test1.value AS value1, test2.key AS key2, test2.value AS value2 FROM ( SELECT key FROM test1 WHERE value='ABC1234' UNION SELECT key FROM test2 WHERE value='ABC1234' ) AS matching_keys INNER JOIN test1 USING (key) LEFT OUTER JOIN test2 USING (key); Nested Loop Left Join (cost=16.84..34.44 rows=2 width=32) (actual time=0.211..0.280 rows=2 loops=1) -> Nested Loop (cost=16.84..33.65 rows=2 width=15) (actual time=0.175..0.212 rows=2 loops=1) -> Unique (cost=16.84..16.85 rows=2 width=6) (actual time=0.132..0.136 rows=2 loops=1) -> Sort (cost=16.84..16.85 rows=2 width=6) (actual time=0.131..0.132 rows=2 loops=1) Sort Key: public.test1.key Sort Method: quicksort Memory: 25kB -> Append (cost=0.00..16.83 rows=2 width=6) (actual time=0.058..0.110 rows=2 loops=1) -> Index Scan using test1_value on test1 (cost=0.00..8.42 rows=1 width=6) (actual time=0.056..0.058 rows=1 loops=1) Index Cond: (value = 'ABC1234'::text) -> Index Scan using test2_value on test2 (cost=0.00..8.39 rows=1 width=7) (actual time=0.046..0.047 rows=1 loops=1) Index Cond: (value = 'ABC1234'::text) -> Index Scan using test1_key on test1 (cost=0.00..8.38 rows=1 width=15) (actual time=0.032..0.033 rows=1 loops=2) Index Cond: (key = public.test1.key) -> Index Scan using test2_key on test2 (cost=0.00..0.38 rows=1 width=17) (actual time=0.028..0.029 rows=1 loops=2) Index Cond: (public.test1.key = key) Total runtime: 0.390 ms (16 rows) </code></pre> Again, UNION serves the role of OR. Unfortunately this approach still performs badly for queries like <code>value>'ABC1234'</code>. You can improve it somewhat by bumping up <code>work_mem</code>. I'm at a loss here. <hr> As for your last question: <blockquote> Why does the normal OR query not choose such a good plan? </blockquote> Because the PostgreSQL planner currently lacks the ability to split OR'ed expressions into separate UNION queries. There are a few caveats that make it harder than it might seem. The PostgreSQL planner is quite elaborate already, but so far it hasn't been a big priority take advantage of optimizations that are already possible by manually rewriting SQL.

Slow JOIN query with OR in WHERE statement

Tags:

performance

sql

database

join

postgresql

Here is an simple example for my problem:

CREATE TABLE test1 (id SERIAL, key TEXT UNIQUE, value TEXT);
CREATE TABLE test2 (id SERIAL, key TEXT UNIQUE, value TEXT);

INSERT INTO test1 (key, value) 
SELECT i::TEXT, 'ABC' || i::TEXT 
FROM generate_series(0, 1000000) AS i;

INSERT INTO test2 (key, value) 
SELECT i::TEXT, 'ABC' || (i+1000)::TEXT 
FROM generate_series(0,  600000) AS i;

INSERT INTO test2 (key, value) 
SELECT i::TEXT, 'ABC' || (i+1000)::TEXT 
FROM generate_series(1000000, 1200000) AS i;

CREATE INDEX test1_key ON test1 (key);
CREATE INDEX test1_value ON test1 (value);
CREATE INDEX test2_key ON test2 (key);
CREATE INDEX test2_value ON test2 (value);

VACUUM FULL VERBOSE ANALYZE test1;
VACUUM FULL VERBOSE ANALYZE test2;

That is the query i'm currently using, but which takes more than 6 seconds.

EXPLAIN ANALYZE 
SELECT test1.key AS key1, test1.value AS value1, 
       test2.key AS key2, test2.value AS value2
FROM test1 
LEFT OUTER JOIN test2 ON (test1.key = test2.key)
WHERE test1.value = 'ABC1234' OR test2.value = 'ABC1234';

 key1 | value1  | key2 | value2
------+---------+------+---------
 234  | ABC234  | 234  | ABC1234
 1234 | ABC1234 | 1234 | ABC2234
(2 rows)

                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=27344.05..79728.10 rows=2 width=32) (actual time=5428.635..6097.098 rows=2 loops=1)
   Hash Cond: (test1.key = test2.key)
   Filter: ((test1.value = 'ABC1234'::text) OR (test2.value = 'ABC1234'::text))
   ->  Seq Scan on test1  (cost=0.00..16321.01 rows=1000001 width=15) (actual time=0.009..1057.315 rows=1000001 loops=1)
   ->  Hash  (cost=13047.02..13047.02 rows=800002 width=17) (actual time=2231.964..2231.964 rows=800002 loops=1)
         Buckets: 65536  Batches: 2  Memory Usage: 14551kB
         ->  Seq Scan on test2  (cost=0.00..13047.02 rows=800002 width=17) (actual time=0.010..980.232 rows=800002 loops=1)
 Total runtime: 6109.042 ms
(8 rows)

In both tables only very few datasets will meet the requirements, but it seems that fact isn't observed. I can instead use a query like that:

EXPLAIN ANALYZE 
SELECT coalesce(test1.key, test3.key1) AS key1, coalesce(test1.value, test3.value1) AS value1,
       coalesce(test2.key, test3.key2) AS key2, coalesce(test2.value, test3.value2) AS value2
FROM (SELECT test1.key AS key1, test1.value AS value1, 
             test2.key AS key2, test2.value AS value2
      FROM (SELECT key, value FROM test1 WHERE value = 'ABC1234') AS test1
      FULL JOIN (SELECT key, value FROM test2 WHERE value = 'ABC1234') AS test2
      ON (test1.key = test2.key)) AS test3
LEFT OUTER JOIN test1 ON (test1.key = test3.key2)
LEFT OUTER JOIN test2 ON (test2.key = test3.key1)
WHERE test1.key IS NOT NULL;

 key1 | value1  | key2 | value2
------+---------+------+---------
 1234 | ABC1234 | 1234 | ABC2234
 234  | ABC234  | 234  | ABC1234
(2 rows)

                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..33.56 rows=1 width=64) (actual time=0.075..0.083 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..25.19 rows=1 width=47) (actual time=0.066..0.072 rows=1 loops=1)
         ->  Nested Loop Left Join  (cost=0.00..16.80 rows=1 width=32) (actual time=0.051..0.054 rows=1 loops=1)
               ->  Index Scan using test2_value_key on test2  (cost=0.00..8.41 rows=1 width=17) (actual time=0.026..0.027 rows=1 loops=1)
                     Index Cond: (value = 'ABC1234'::text)
               ->  Index Scan using test1_key on test1  (cost=0.00..8.38 rows=1 width=15) (actual time=0.020..0.020 rows=0 loops=1)
                     Index Cond: (public.test1.key = public.test2.key)
                     Filter: (public.test1.value = 'ABC1234'::text)
         ->  Index Scan using test1_key on test1  (cost=0.00..8.38 rows=1 width=15) (actual time=0.011..0.013 rows=1 loops=1)
               Index Cond: ((public.test1.key IS NOT NULL) AND (public.test1.key = public.test2.key))
   ->  Index Scan using test2_key on test2  (cost=0.00..8.36 rows=1 width=17) (actual time=0.001..0.001 rows=0 loops=1)
         Index Cond: (public.test2.key = public.test1.key)
 Total runtime: 0.139 ms

The following query is simpler, but still too slow:

EXPLAIN ANALYZE
SELECT test1.key AS key1, test1.value AS value1, 
       test2.key AS key2, test2.value AS value2
FROM test1 
LEFT OUTER JOIN test2 ON (test1.key = test2.key)
WHERE test1.value = 'ABC1234'
   OR EXISTS (SELECT 1 FROM test2 t WHERE t.key = test1.key AND t.value = 'ABC1234');

 key1 | value1  | key2 | value2
------+---------+------+---------
 1234 | ABC1234 | 1234 | ABC2234
 234  | ABC234  | 234  | ABC1234
(2 rows)

                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=0.00..8446826.32 rows=500001 width=32) (actual time=615.706..1651.370 rows=2 loops=1)
   Merge Cond: (test1.key = test2.key)
   ->  Index Scan using test1_key on test1  (cost=0.00..8398983.25 rows=500001 width=15) (actual time=28.449..734.567 rows=2 loops=1)
         Filter: ((value = 'ABC1234'::text) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
         SubPlan 1
           ->  Index Scan using test2_key on test2 t  (cost=0.00..8.36 rows=1 width=0) (never executed)
                 Index Cond: (key = $0)
                 Filter: (value = 'ABC1234'::text)
         SubPlan 2
           ->  Index Scan using test2_value on test2 t  (cost=0.00..8.37 rows=1 width=7) (actual time=0.376..0.380 rows=1 loops=1)
                 Index Cond: (value = 'ABC1234'::text)
   ->  Index Scan using test2_key on test2  (cost=0.00..39593.05 rows=800002 width=17) (actual time=0.019..498.456 rows=348894 loops=1)
 Total runtime: 1651.453 ms
(13 rows)

So my question is: Is there a simple query which will lead to a similar fast execution plan like the second query or maybe an index or some kind of hint for the planner.

(I know for that example it would reasonable to only have one table with both values in it. But in reality the tables are more complicated and the table scheme can't be changed that easily.)

PostgreSQL Version: 9.0.3
shared_buffers = 64MB
effective_cache_size = 32MB
work_mem = 16MB
maintenance_work_mem = 32MB
temp_buffers = 8MB
wal_buffers= 1MB

EDIT: As suggested from Kipotlov here is the UNION version. Why does the normal OR query not choose such a good plan?

EXPLAIN ANALYZE
SELECT test1.key AS key1, test1.value AS value1, 
       test2.key AS key2, test2.value AS value2
FROM test1 
LEFT OUTER JOIN test2 ON (test1.key = test2.key)
WHERE test1.value = 'ABC1234'
UNION
SELECT test1.key AS key1, test1.value AS value1, 
       test2.key AS key2, test2.value AS value2
FROM test1 
LEFT OUTER JOIN test2 ON (test1.key = test2.key)
WHERE test2.value = 'ABC1234';

 key1 | value1  | key2 | value2
------+---------+------+---------
 1234 | ABC1234 | 1234 | ABC2234
 234  | ABC234  | 234  | ABC1234
(2 rows)

                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=33.64..33.66 rows=2 width=32) (actual time=0.114..0.119 rows=2 loops=1)
   ->  Sort  (cost=33.64..33.64 rows=2 width=32) (actual time=0.111..0.113 rows=2 loops=1)
         Sort Key: public.test1.key, public.test1.value, public.test2.key, public.test2.value
         Sort Method:  quicksort  Memory: 17kB
         ->  Append  (cost=0.00..33.63 rows=2 width=32) (actual time=0.046..0.097 rows=2 loops=1)
               ->  Nested Loop Left Join  (cost=0.00..16.81 rows=1 width=32) (actual time=0.044..0.050 rows=1 loops=1)
                     ->  Index Scan using test1_value_key on test1  (cost=0.00..8.44 rows=1 width=15) (actual time=0.023..0.024 rows=1 loops=1)
                           Index Cond: (value = 'ABC1234'::text)
                     ->  Index Scan using test2_key on test2  (cost=0.00..8.36 rows=1 width=17) (actual time=0.014..0.016 rows=1 loops=1)
                           Index Cond: (public.test1.key = public.test2.key)
               ->  Nested Loop  (cost=0.00..16.80 rows=1 width=32) (actual time=0.036..0.041 rows=1 loops=1)
                     ->  Index Scan using test2_value_key on test2  (cost=0.00..8.41 rows=1 width=17) (actual time=0.019..0.020 rows=1 loops=1)
                           Index Cond: (value = 'ABC1234'::text)
                     ->  Index Scan using test1_key on test1  (cost=0.00..8.38 rows=1 width=15) (actual time=0.013..0.015 rows=1 loops=1)
                           Index Cond: (public.test1.key = public.test2.key)
 Total runtime: 0.173 ms
(16 rows)

708

asked Mar 18 '11 14:03

rudi-moore

1 Answers

First of all, thanks for the very detailed question. It's rare to find people who have researched their problem into such detail before asking.

I've been thinking about this and the problem seems to be that PostgreSQL wants to join all rows, because each non-matching row from test1 might potentially be matched in test2 -- and vice versa.

The solution is forcing the planner to execute the query in two steps. One approach is the big UNION query you already tried -- to force it to consider each expression in a separate query.

Another approach is forcing the planner to find matching keys first, then perform the join, so there can be no ambiguity:

EXPLAIN ANALYZE
SELECT test1.key AS key1, test1.value AS value1, 
       test2.key AS key2, test2.value AS value2
FROM (
    SELECT key FROM test1 WHERE value='ABC1234'
    UNION SELECT key FROM test2 WHERE value='ABC1234'
) AS matching_keys
INNER JOIN test1 USING (key)
LEFT OUTER JOIN test2 USING (key);

 Nested Loop Left Join  (cost=16.84..34.44 rows=2 width=32) (actual time=0.211..0.280 rows=2 loops=1)
   ->  Nested Loop  (cost=16.84..33.65 rows=2 width=15) (actual time=0.175..0.212 rows=2 loops=1)
         ->  Unique  (cost=16.84..16.85 rows=2 width=6) (actual time=0.132..0.136 rows=2 loops=1)
               ->  Sort  (cost=16.84..16.85 rows=2 width=6) (actual time=0.131..0.132 rows=2 loops=1)
                     Sort Key: public.test1.key
                     Sort Method: quicksort  Memory: 25kB
                     ->  Append  (cost=0.00..16.83 rows=2 width=6) (actual time=0.058..0.110 rows=2 loops=1)
                           ->  Index Scan using test1_value on test1  (cost=0.00..8.42 rows=1 width=6) (actual time=0.056..0.058 rows=1 loops=1)
                                 Index Cond: (value = 'ABC1234'::text)
                           ->  Index Scan using test2_value on test2  (cost=0.00..8.39 rows=1 width=7) (actual time=0.046..0.047 rows=1 loops=1)
                                 Index Cond: (value = 'ABC1234'::text)
         ->  Index Scan using test1_key on test1  (cost=0.00..8.38 rows=1 width=15) (actual time=0.032..0.033 rows=1 loops=2)
               Index Cond: (key = public.test1.key)
   ->  Index Scan using test2_key on test2  (cost=0.00..0.38 rows=1 width=17) (actual time=0.028..0.029 rows=1 loops=2)
         Index Cond: (public.test1.key = key)
 Total runtime: 0.390 ms
(16 rows)

Again, UNION serves the role of OR. Unfortunately this approach still performs badly for queries like value>'ABC1234'. You can improve it somewhat by bumping up work_mem. I'm at a loss here.

As for your last question:

Why does the normal OR query not choose such a good plan?

Because the PostgreSQL planner currently lacks the ability to split OR'ed expressions into separate UNION queries. There are a few caveats that make it harder than it might seem.

The PostgreSQL planner is quite elaborate already, but so far it hasn't been a big priority take advantage of optimizations that are already possible by manually rewriting SQL.

177

answered Oct 05 '22 10:10

intgr

Related questions
                            
                                php mysql_insert_id on multiple rows?
                            
                                ROW_NUMBER() OVER Not Fast Enough With Large Result Set, any good solution?
                            
                                SQL Syntax to Pivot multiple tables
                            
                                sql server: Estimated number of rows is way off
                            
                                SQL Server units question
                            
                                Row locks - manually using them
                            
                                Running a script to create tables with HSQLDB
                            
                                Oracle MERGE does not INSERT
                            
                                Why does SQLite take such a long time to fetch the data?
                            
                                Parameterized SQL statements vs. very simple method
                            
                                SQL Replace Into question
                            
                                Why the most natural query(I.e. using INNER JOIN (instead of LEFT JOIN)) is very slow
                            
                                SQL Server Enterprise Manager - Mass Delete of Tables and Changing Ownership of Tables
                            
                                What's the SQL significance of 0x5E5B7D7E?
                            
                                Index a varchar column
                            
                                Include SQL file to another SQL file
                            
                                mySQL: is it possible to make this query any faster?
                            
                                PHPMyAdmin - Total record count varies
                            
                                Can someone explain this SQL query to me?
                            
                                find all tables not referenced in stored procedures

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With