Why is the "IN" operator so slow when used with subquery? <pre class="prettyprint"><code>select * from view1 where id in (1,2,3,4,5,6,7,8,9,10) order by somedata; </code></pre> executes in 9ms. <pre class="prettyprint"><code>select * from view1 where id in (select ext_id from aggregate_table order by somedata limit 10) order by somedata; </code></pre> executes in 25000ms and seems to use sequential scan on the view (<code>view1</code>) instead of index scan on primary keys returned by subquery as in does in first query. The subquery <code>select ext_id from aggregate_table order by somedata limit 10</code> executes in 0.1ms so the slowness of the second query is caused by sequential scan on <code>view1</code> which is a view containing three UNIONS and about three JOINS in each UNION. The first UNION contains about a 1M rows, others much less. Joins with tables with some 100K rows. That's not so relevant, though, I just wanted to understand the IN operator behaviour. What I'm trying to accomplish is to take the result of subquery (a set of primary keys) and select the data from a complex view (<code>view1</code>) using just them. I also cannot use <pre class="prettyprint"><code>select v1.* from view1 v1, aggregate_table at where v1.id = at.ext_id order by at.somedata limit 10 </code></pre> because I do not want to sort the big join by <code>somedata</code>. I just want to select 10 results from the view by primary keys and then sort only these. The question is why does IN operator perform fast when I explicitly list these keys and so slow when I use a fast subquery that returns the exact same set of keys? EXPLAIN ANALYZE as requested first query - <code>select * from view1 where id in (1,2,3,4,5,6,7,8,9,10) order by somedata;</code> <pre class="prettyprint"><code> Sort (cost=348.480..348.550 rows=30 width=943) (actual time=14.385..14.399 rows=10 loops=1) Sort Key: "india".three Sort Method: quicksort Memory: 30kB -> Append (cost=47.650..347.440 rows=30 width=334) (actual time=11.528..14.275 rows=10 loops=1) -> Subquery Scan "*SELECT* 1" (cost=47.650..172.110 rows=10 width=496) (actual time=11.526..12.301 rows=10 loops=1) -> Nested Loop (cost=47.650..172.010 rows=10 width=496) (actual time=11.520..12.268 rows=10 loops=1) -> Hash Join (cost=47.650..87.710 rows=10 width=371) (actual time=11.054..11.461 rows=10 loops=1) Hash Cond: (hotel.alpha_five = juliet_xray.alpha_five) -> Bitmap Heap Scan on sierra hotel (cost=42.890..82.800 rows=10 width=345) (actual time=10.835..11.203 rows=10 loops=1) Recheck Cond: (four = ANY ('quebec'::integer[])) -> Bitmap Index Scan on seven (cost=0.000..42.890 rows=10 width=0) (actual time=0.194..0.194 rows=10 loops=1) Index Cond: (four = ANY ('quebec'::integer[])) -> Hash (cost=4.340..4.340 rows=34 width=30) (actual time=0.184..0.184 rows=34 loops=1) -> Seq Scan on six juliet_xray (cost=0.000..4.340 rows=34 width=30) (actual time=0.029..0.124 rows=34 loops=1) -> Index Scan using charlie on juliet_two zulu (cost=0.000..8.390 rows=1 width=129) (actual time=0.065..0.067 rows=1 loops=10) Index Cond: (zulu.four = hotel.victor_whiskey) -> Subquery Scan "*SELECT* 2" (cost=4.760..97.420 rows=10 width=366) (actual time=0.168..0.168 rows=0 loops=1) -> Hash Join (cost=4.760..97.320 rows=10 width=366) (actual time=0.165..0.165 rows=0 loops=1) Hash Cond: (alpha_xray.alpha_five = juliet_xray2.alpha_five) -> Nested Loop (cost=0.000..92.390 rows=10 width=340) (actual time=0.162..0.162 rows=0 loops=1) -> Seq Scan on lima_echo alpha_xray (cost=0.000..8.340 rows=10 width=216) (actual time=0.159..0.159 rows=0 loops=1) Filter: (four = ANY ('quebec'::integer[])) -> Index Scan using charlie on juliet_two xray (cost=0.000..8.390 rows=1 width=128) (never executed) Index Cond: (zulu2.four = alpha_xray.victor_whiskey) -> Hash (cost=4.340..4.340 rows=34 width=30) (never executed) -> Seq Scan on six uniform (cost=0.000..4.340 rows=34 width=30) (never executed) -> Subquery Scan "*SELECT* 3" (cost=43.350..77.910 rows=10 width=141) (actual time=1.775..1.775 rows=0 loops=1) -> Hash Join (cost=43.350..77.810 rows=10 width=141) (actual time=1.771..1.771 rows=0 loops=1) Hash Cond: (golf.alpha_five = juliet_xray3.alpha_five) -> Bitmap Heap Scan on lima_golf golf (cost=38.590..72.910 rows=10 width=115) (actual time=0.110..0.110 rows=0 loops=1) Recheck Cond: (four = ANY ('quebec'::integer[])) -> Bitmap Index Scan on victor_hotel (cost=0.000..38.590 rows=10 width=0) (actual time=0.105..0.105 rows=0 loops=1) Index Cond: (four = ANY ('quebec'::integer[])) -> Hash (cost=4.340..4.340 rows=34 width=30) (actual time=0.118..0.118 rows=34 loops=1) -> Seq Scan on six victor_kilo (cost=0.000..4.340 rows=34 width=30) (actual time=0.007..0.063 rows=34 loops=1) Total runtime: 14.728 ms </code></pre> second query - <code>select * from view1 where id in (select ext_id from aggregate_table order by somedata limit 10) order by somedata;</code> <pre class="prettyprint"><code>Sort (cost=254515.780..254654.090 rows=55325 width=943) (actual time=24687.475..24687.488 rows=10 loops=1) Sort Key: "five".xray_alpha Sort Method: quicksort Memory: 30kB -> Hash Semi Join (cost=54300.820..250157.370 rows=55325 width=943) (actual time=11921.783..24687.308 rows=10 loops=1) Hash Cond: ("five".lima = "delta_echo".lima) -> Append (cost=54298.270..235569.720 rows=1106504 width=494) (actual time=3412.453..23091.938 rows=1106503 loops=1) -> Subquery Scan "*SELECT* 1" (cost=54298.270..234227.250 rows=1100622 width=496) (actual time=3412.450..20234.122 rows=1100622 loops=1) -> Hash Join (cost=54298.270..223221.030 rows=1100622 width=496) (actual time=3412.445..17078.021 rows=1100622 loops=1) Hash Cond: (three_victor.xray_hotel = delta_yankee.xray_hotel) -> Hash Join (cost=54293.500..180567.160 rows=1100622 width=470) (actual time=3412.251..12108.676 rows=1100622 loops=1) Hash Cond: (three_victor.tango_three = quebec_seven.lima) -> Seq Scan on india three_victor (cost=0.000..104261.220 rows=1100622 width=345) (actual time=0.015..3437.722 rows=1100622 loops=1) -> Hash (cost=44613.780..44613.780 rows=774378 width=129) (actual time=3412.031..3412.031 rows=774603 loops=1) -> Seq Scan on oscar quebec_seven (cost=0.000..44613.780 rows=774378 width=129) (actual time=4.142..1964.036 rows=774603 loops=1) -> Hash (cost=4.340..4.340 rows=34 width=30) (actual time=0.149..0.149 rows=34 loops=1) -> Seq Scan on alpha_kilo delta_yankee (cost=0.000..4.340 rows=34 width=30) (actual time=0.017..0.095 rows=34 loops=1) -> Subquery Scan "*SELECT* 2" (cost=4.760..884.690 rows=104 width=366) (actual time=7.846..10.161 rows=104 loops=1) -> Hash Join (cost=4.760..883.650 rows=104 width=366) (actual time=7.837..9.804 rows=104 loops=1) Hash Cond: (foxtrot.xray_hotel = delta_yankee2.xray_hotel) -> Nested Loop (cost=0.000..877.200 rows=104 width=340) (actual time=7.573..9.156 rows=104 loops=1) -> Seq Scan on four_india foxtrot (cost=0.000..7.040 rows=104 width=216) (actual time=0.081..0.311 rows=104 loops=1) -> Index Scan using three_delta on oscar alpha_victor (cost=0.000..8.350 rows=1 width=128) (actual time=0.077..0.078 rows=1 loops=104) Index Cond: (quebec_seven2.lima = foxtrot.tango_three) -> Hash (cost=4.340..4.340 rows=34 width=30) (actual time=0.216..0.216 rows=34 loops=1) -> Seq Scan on alpha_kilo quebec_foxtrot (cost=0.000..4.340 rows=34 width=30) (actual time=0.035..0.153 rows=34 loops=1) -> Subquery Scan "*SELECT* 3" (cost=4.760..457.770 rows=5778 width=141) (actual time=0.264..58.353 rows=5777 loops=1) -> Hash Join (cost=4.760..399.990 rows=5778 width=141) (actual time=0.253..39.062 rows=5777 loops=1) Hash Cond: (four_uniform.xray_hotel = delta_yankee3.xray_hotel) -> Seq Scan on whiskey four_uniform (cost=0.000..315.780 rows=5778 width=115) (actual time=0.112..15.759 rows=5778 loops=1) -> Hash (cost=4.340..4.340 rows=34 width=30) (actual time=0.117..0.117 rows=34 loops=1) -> Seq Scan on alpha_kilo golf (cost=0.000..4.340 rows=34 width=30) (actual time=0.005..0.059 rows=34 loops=1) -> Hash (cost=2.430..2.430 rows=10 width=4) (actual time=0.303..0.303 rows=10 loops=1) -> Subquery Scan "ANY_subquery" (cost=0.000..2.430 rows=10 width=4) (actual time=0.092..0.284 rows=10 loops=1) -> Limit (cost=0.000..2.330 rows=10 width=68) (actual time=0.089..0.252 rows=10 loops=1) -> Index Scan using tango_seven on zulu romeo (cost=0.000..257535.070 rows=1106504 width=68) (actual time=0.087..0.227 rows=10 loops=1) Total runtime: 24687.975 ms </code></pre>

I suspect <code>where id in (1,2,3,4,5,6,7,8,9,10)</code> can be optimised and <code>where id in (select ...)</code> can't, the reason being that <code>(1,2,3,4,5,6,7,8,9,10)</code> is a constant expression, while the <code>select</code> is not. How about: <pre class="prettyprint"><code>WITH myCTE AS ( SELECT ext_id FROM aggregate_table ORDER BY somedata LIMIT 10 ) SELECT * FROM myCTE LEFT JOIN table1 ON myCTE.ext_id = table1.id ORDER BY somedata </code></pre>

PostgreSQL IN operator with subquery poor performance

Tags:

performance

postgresql

operator-keyword

rdbms

Why is the "IN" operator so slow when used with subquery?

select *  from view1  where id in (1,2,3,4,5,6,7,8,9,10)  order by somedata;

executes in 9ms.

select *  from view1  where id in (select ext_id               from aggregate_table               order by somedata limit 10)  order by somedata;

executes in 25000ms and seems to use sequential scan on the view (view1) instead of index scan on primary keys returned by subquery as in does in first query.

The subquery select ext_id from aggregate_table order by somedata limit 10 executes in 0.1ms

so the slowness of the second query is caused by sequential scan on view1 which is a view containing three UNIONS and about three JOINS in each UNION. The first UNION contains about a 1M rows, others much less. Joins with tables with some 100K rows. That's not so relevant, though, I just wanted to understand the IN operator behaviour.

What I'm trying to accomplish is to take the result of subquery (a set of primary keys) and select the data from a complex view (view1) using just them.

I also cannot use

select v1.*  from view1 v1,       aggregate_table at  where v1.id = at.ext_id  order by at.somedata  limit 10

because I do not want to sort the big join by somedata. I just want to select 10 results from the view by primary keys and then sort only these.

The question is why does IN operator perform fast when I explicitly list these keys and so slow when I use a fast subquery that returns the exact same set of keys?

EXPLAIN ANALYZE as requested

first query - select * from view1 where id in (1,2,3,4,5,6,7,8,9,10) order by somedata;

    Sort  (cost=348.480..348.550 rows=30 width=943) (actual time=14.385..14.399 rows=10 loops=1)     Sort Key: "india".three     Sort Method:  quicksort  Memory: 30kB   ->  Append  (cost=47.650..347.440 rows=30 width=334) (actual time=11.528..14.275 rows=10 loops=1)         ->  Subquery Scan "*SELECT* 1"  (cost=47.650..172.110 rows=10 width=496) (actual time=11.526..12.301 rows=10 loops=1)               ->  Nested Loop  (cost=47.650..172.010 rows=10 width=496) (actual time=11.520..12.268 rows=10 loops=1)                     ->  Hash Join  (cost=47.650..87.710 rows=10 width=371) (actual time=11.054..11.461 rows=10 loops=1)                             Hash Cond: (hotel.alpha_five = juliet_xray.alpha_five)                           ->  Bitmap Heap Scan on sierra hotel  (cost=42.890..82.800 rows=10 width=345) (actual time=10.835..11.203 rows=10 loops=1)                                   Recheck Cond: (four = ANY ('quebec'::integer[]))                                 ->  Bitmap Index Scan on seven  (cost=0.000..42.890 rows=10 width=0) (actual time=0.194..0.194 rows=10 loops=1)                                         Index Cond: (four = ANY ('quebec'::integer[]))                           ->  Hash  (cost=4.340..4.340 rows=34 width=30) (actual time=0.184..0.184 rows=34 loops=1)                                 ->  Seq Scan on six juliet_xray  (cost=0.000..4.340 rows=34 width=30) (actual time=0.029..0.124 rows=34 loops=1)                     ->  Index Scan using charlie on juliet_two zulu  (cost=0.000..8.390 rows=1 width=129) (actual time=0.065..0.067 rows=1 loops=10)                             Index Cond: (zulu.four = hotel.victor_whiskey)         ->  Subquery Scan "*SELECT* 2"  (cost=4.760..97.420 rows=10 width=366) (actual time=0.168..0.168 rows=0 loops=1)               ->  Hash Join  (cost=4.760..97.320 rows=10 width=366) (actual time=0.165..0.165 rows=0 loops=1)                       Hash Cond: (alpha_xray.alpha_five = juliet_xray2.alpha_five)                     ->  Nested Loop  (cost=0.000..92.390 rows=10 width=340) (actual time=0.162..0.162 rows=0 loops=1)                           ->  Seq Scan on lima_echo alpha_xray  (cost=0.000..8.340 rows=10 width=216) (actual time=0.159..0.159 rows=0 loops=1)                                   Filter: (four = ANY ('quebec'::integer[]))                           ->  Index Scan using charlie on juliet_two xray  (cost=0.000..8.390 rows=1 width=128) (never executed)                                   Index Cond: (zulu2.four = alpha_xray.victor_whiskey)                     ->  Hash  (cost=4.340..4.340 rows=34 width=30) (never executed)                           ->  Seq Scan on six uniform  (cost=0.000..4.340 rows=34 width=30) (never executed)         ->  Subquery Scan "*SELECT* 3"  (cost=43.350..77.910 rows=10 width=141) (actual time=1.775..1.775 rows=0 loops=1)               ->  Hash Join  (cost=43.350..77.810 rows=10 width=141) (actual time=1.771..1.771 rows=0 loops=1)                       Hash Cond: (golf.alpha_five = juliet_xray3.alpha_five)                     ->  Bitmap Heap Scan on lima_golf golf  (cost=38.590..72.910 rows=10 width=115) (actual time=0.110..0.110 rows=0 loops=1)                             Recheck Cond: (four = ANY ('quebec'::integer[]))                           ->  Bitmap Index Scan on victor_hotel  (cost=0.000..38.590 rows=10 width=0) (actual time=0.105..0.105 rows=0 loops=1)                                   Index Cond: (four = ANY ('quebec'::integer[]))                     ->  Hash  (cost=4.340..4.340 rows=34 width=30) (actual time=0.118..0.118 rows=34 loops=1)                           ->  Seq Scan on six victor_kilo  (cost=0.000..4.340 rows=34 width=30) (actual time=0.007..0.063 rows=34 loops=1)  Total runtime: 14.728 ms

second query - select * from view1 where id in (select ext_id from aggregate_table order by somedata limit 10) order by somedata;

Sort  (cost=254515.780..254654.090 rows=55325 width=943) (actual time=24687.475..24687.488 rows=10 loops=1)     Sort Key: "five".xray_alpha     Sort Method:  quicksort  Memory: 30kB   ->  Hash Semi Join  (cost=54300.820..250157.370 rows=55325 width=943) (actual time=11921.783..24687.308 rows=10 loops=1)           Hash Cond: ("five".lima = "delta_echo".lima)         ->  Append  (cost=54298.270..235569.720 rows=1106504 width=494) (actual time=3412.453..23091.938 rows=1106503 loops=1)               ->  Subquery Scan "*SELECT* 1"  (cost=54298.270..234227.250 rows=1100622 width=496) (actual time=3412.450..20234.122 rows=1100622 loops=1)                     ->  Hash Join  (cost=54298.270..223221.030 rows=1100622 width=496) (actual time=3412.445..17078.021 rows=1100622 loops=1)                             Hash Cond: (three_victor.xray_hotel = delta_yankee.xray_hotel)                           ->  Hash Join  (cost=54293.500..180567.160 rows=1100622 width=470) (actual time=3412.251..12108.676 rows=1100622 loops=1)                                   Hash Cond: (three_victor.tango_three = quebec_seven.lima)                                 ->  Seq Scan on india three_victor  (cost=0.000..104261.220 rows=1100622 width=345) (actual time=0.015..3437.722 rows=1100622 loops=1)                                 ->  Hash  (cost=44613.780..44613.780 rows=774378 width=129) (actual time=3412.031..3412.031 rows=774603 loops=1)                                       ->  Seq Scan on oscar quebec_seven  (cost=0.000..44613.780 rows=774378 width=129) (actual time=4.142..1964.036 rows=774603 loops=1)                           ->  Hash  (cost=4.340..4.340 rows=34 width=30) (actual time=0.149..0.149 rows=34 loops=1)                                 ->  Seq Scan on alpha_kilo delta_yankee  (cost=0.000..4.340 rows=34 width=30) (actual time=0.017..0.095 rows=34 loops=1)               ->  Subquery Scan "*SELECT* 2"  (cost=4.760..884.690 rows=104 width=366) (actual time=7.846..10.161 rows=104 loops=1)                     ->  Hash Join  (cost=4.760..883.650 rows=104 width=366) (actual time=7.837..9.804 rows=104 loops=1)                             Hash Cond: (foxtrot.xray_hotel = delta_yankee2.xray_hotel)                           ->  Nested Loop  (cost=0.000..877.200 rows=104 width=340) (actual time=7.573..9.156 rows=104 loops=1)                                 ->  Seq Scan on four_india foxtrot  (cost=0.000..7.040 rows=104 width=216) (actual time=0.081..0.311 rows=104 loops=1)                                 ->  Index Scan using three_delta on oscar alpha_victor  (cost=0.000..8.350 rows=1 width=128) (actual time=0.077..0.078 rows=1 loops=104)                                         Index Cond: (quebec_seven2.lima = foxtrot.tango_three)                           ->  Hash  (cost=4.340..4.340 rows=34 width=30) (actual time=0.216..0.216 rows=34 loops=1)                                 ->  Seq Scan on alpha_kilo quebec_foxtrot  (cost=0.000..4.340 rows=34 width=30) (actual time=0.035..0.153 rows=34 loops=1)               ->  Subquery Scan "*SELECT* 3"  (cost=4.760..457.770 rows=5778 width=141) (actual time=0.264..58.353 rows=5777 loops=1)                     ->  Hash Join  (cost=4.760..399.990 rows=5778 width=141) (actual time=0.253..39.062 rows=5777 loops=1)                             Hash Cond: (four_uniform.xray_hotel = delta_yankee3.xray_hotel)                           ->  Seq Scan on whiskey four_uniform  (cost=0.000..315.780 rows=5778 width=115) (actual time=0.112..15.759 rows=5778 loops=1)                           ->  Hash  (cost=4.340..4.340 rows=34 width=30) (actual time=0.117..0.117 rows=34 loops=1)                                 ->  Seq Scan on alpha_kilo golf  (cost=0.000..4.340 rows=34 width=30) (actual time=0.005..0.059 rows=34 loops=1)         ->  Hash  (cost=2.430..2.430 rows=10 width=4) (actual time=0.303..0.303 rows=10 loops=1)               ->  Subquery Scan "ANY_subquery"  (cost=0.000..2.430 rows=10 width=4) (actual time=0.092..0.284 rows=10 loops=1)                     ->  Limit  (cost=0.000..2.330 rows=10 width=68) (actual time=0.089..0.252 rows=10 loops=1)                           ->  Index Scan using tango_seven on zulu romeo  (cost=0.000..257535.070 rows=1106504 width=68) (actual time=0.087..0.227 rows=10 loops=1)  Total runtime: 24687.975 ms

652

asked Feb 20 '13 18:02

Snifff

2 Answers

Seems that I have finally found a solution:

select *    from view1    where view1.id = ANY(                        (select array(select ext_id                                       from aggregate_table                                       order by somedata limit 10)                        )::integer[]                       )    order by view1.somedata;

After elaborating @Dukeling's idea:

I suspect where id in (1,2,3,4,5,6,7,8,9,10) can be optimised and where id in (select ...) can't, the reason being that (1,2,3,4,5,6,7,8,9,10) is a constant expression, while the select is not.

and locating these in faster query plan

Recheck Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[])) Index Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))

this works even faster than the first query in the question, about 1.2ms, and now it uses

Recheck Cond: (id = ANY ($1)) Index Cond: (id = ANY ($1))

and bitmap scans in the plan.

133

answered Sep 28 '22 02:09

Snifff

I suspect where id in (1,2,3,4,5,6,7,8,9,10) can be optimised and where id in (select ...) can't, the reason being that (1,2,3,4,5,6,7,8,9,10) is a constant expression, while the select is not.

How about:

WITH myCTE AS (   SELECT ext_id   FROM aggregate_table   ORDER BY somedata   LIMIT 10 ) SELECT * FROM myCTE LEFT JOIN table1   ON myCTE.ext_id = table1.id ORDER BY somedata

answered Sep 28 '22 02:09

Bernhard Barker

Related questions
                            
                                Aggregate vs Sum Performance in LINQ
                            
                                C# List<T>.ToArray performance is bad?
                            
                                Android Studio got very slow and constantly "updating indices"
                            
                                Python string 'join' is faster (?) than '+', but what's wrong here?
                            
                                How long does it take for you to be comfortable with Haskell?
                            
                                Can not increase max_open_files for Mysql max-connections in Ubuntu 15
                            
                                Performance of Google's Go?
                            
                                ?: Operator Vs. If Statement Performance
                            
                                What is the most performant way for dynamic styling in React-Native?
                            
                                Improve pandas (PyTables?) HDF5 table write performance
                            
                                How well does Git LFS handle small files?
                            
                                Branchless internal merge slower than internal merge with branch
                            
                                Why does JIT order affect performance?
                            
                                How to speed up Android Studio compilation process
                            
                                What is the benefit of setting java.awt.headless=true?
                            
                                Core Graphics Performance on iOS
                            
                                Are there any good Clojure benchmarks?
                            
                                Android: Use UUID as primary key in SQLite
                            
                                Is there any runtime cost for Casting in Java?
                            
                                How slow (how many cycles) is calculating a square root?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PostgreSQL IN operator with subquery poor performance

Tags:

performance

postgresql

operator-keyword

rdbms

Snifff

People also ask

2 Answers

Snifff

Bernhard Barker

Recent Activity

Donate For Us