Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres query is very slow when using a parameter instead of an hardcoded string

I am running into this Postgres issue where the same query takes a long time to execute if I use a parameter vs hardcoding its value on the query string. The column name is 'media_type' and it's a VARCHAR(20). I am running these queries from PHP, using Symfony2 and Doctrine2 ORM and the table in question has about 1.000.000 records.

Do I have a problem on my query? Can it be a Postgres configuration problem?

1 - Hardcoded value for media_type

duration: 5.365 ms  parse pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = 'Collection' AND enabled = 'true' AND site_id = $1 AND user_id = $2 ORDER BY id DESC LIMIT $3 OFFSET $4
duration: 0.142 ms  bind pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = 'Collection' AND enabled = 'true' AND site_id = $1 AND user_id = $2 ORDER BY id DESC LIMIT $3 OFFSET $4
parameters: $1 = '1', $2 = '1', $3 = '100', $4 = '0'
duration: 8.667 ms  execute pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = 'Collection' AND enabled = 'true' AND site_id = $1 AND user_id = $2 ORDER BY id DESC LIMIT $3 OFFSET $4
parameters: $1 = '1', $2 = '1', $3 = '100', $4 = '0'

Execution plan:

duration: 8.640 ms  plan:
    Query Text: SELECT id,site_id FROM item where media_type = 'Collection' AND enabled = 'true' AND site_id = $1 AND user_id = $2 ORDER BY id DESC LIMIT $3 OFFSET $4
    Limit  (cost=8.38..8.38 rows=1 width=12) (actual time=8.516..8.595 rows=24 loops=1)
      Buffers: shared hit=10 read=15
        ->  Sort  (cost=8.38..8.38 rows=1 width=12) (actual time=8.505..8.530 rows=24 loops=1)
            Sort Key: id
            Sort Method: quicksort  Memory: 26kB
            Buffers: shared hit=10 read=15
            ->  Index Scan using item_media_type_index on item  (cost=0.00..8.37 rows=1 width=12) (actual time=7.955..8.397 rows=24 loops=1)
                    Index Cond: ((media_type)::text = 'Collection'::text)
                    Filter: (enabled AND (site_id = $1) AND (user_id = $2))
                    Buffers: shared hit=8 read=15

2 - Using a parameter for media_type (SLOWER)

duration: 5.557 ms  parse pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = $1 AND enabled = 'true' AND site_id = $2 AND user_id = $3 ORDER BY id DESC LIMIT $4 OFFSET $5
duration: 1.322 ms  bind pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = $1 AND enabled = 'true' AND site_id = $2 AND user_id = $3 ORDER BY id DESC LIMIT $4 OFFSET $5
parameters: $1 = 'Collection', $2 = '1', $3 = '1', $4 = '100', $5 = '0'
duration: 71564.998 ms  execute pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = $1 AND enabled = 'true' AND site_id = $2 AND user_id = $3 ORDER BY id DESC LIMIT $4 OFFSET $5
parameters: $1 = 'Collection', $2 = '1', $3 = '1', $4 = '100', $5 = '0'

Execution plan:

duration: 71564.922 ms  plan:
    Query Text: SELECT id,site_id FROM item where media_type = $1 AND enabled = 'true' AND site_id = $2 AND user_id = $3 ORDER BY id DESC LIMIT $4 OFFSET $5
    Limit  (cost=90663.16..181326.31 rows=17184 width=12) (actual time=3.667..71564.864 rows=24 loops=1)
      Buffers: shared hit=183786 read=96585
        ->  Index Scan Backward using item_pkey on item  (cost=0.00..906610.46 rows=171836 width=12) (actual time=3.655..71564.798 rows=24 loops=1)
               Filter: (enabled AND ((media_type)::text = $1) AND (site_id = $2) AND (user_id = $3))
               Buffers: shared hit=183786 read=96585

Thanks in advance.

like image 441
luis Avatar asked May 30 '12 22:05

luis


3 Answers

This is a bit of a long-standing wart in PostgreSQL that's historically required some interesting planner tuning to work around. It's fixed in PostgreSQL 9.2 (in beta now) though thanks, as usual, to Tom Lane.

E.1.3.1.3. Optimizer

Improve the ability of the planner to choose parameterized plans (Tom Lane)

A prepared statement is now parsed, analyzed, and rewritten, but not necessarily planned. When the prepared plan is executed with parameters, the planner might replan it for every constant, or it might execute a generic plan if its cost is close to that of a constant-specific plan.

See the 9.2 beta release notes and a quick note I wrote about this on lwn.net. There's lots of info about handling prepared / parameterised statements running slower than normal ones on the mailing lists.

like image 193
Craig Ringer Avatar answered Sep 30 '22 14:09

Craig Ringer


I experienced a very similar issue once when I was binding to a SMALLINT field and passing in a value that Postgres was implicitly casting from INTEGER to SMALLINT. I fixed it by making the cast explicit. Since media_type is type VARCHAR(20), Postgres is doing an implicit cast from type TEXT. Try this:

where media_type = $1::VARCHAR(20)

like image 27
Sam Choukri Avatar answered Sep 30 '22 16:09

Sam Choukri


On your static, faster query, the item_media_type_index is being used. On your bound, slower query, the item_media_type_index is not being used.

What is the selectivity of the "media_type" column in the "item" table? If you did a:

SELECT media_type, COUNT(*)
  FROM item
  GROUP BY media_type
  ORDER BY 2 desc

Are the media_types evenly balanced or are there relatively few 'Collection' media_types compared to the others? If there are relatively few 'Collection' items then I would hazard this guess: with the static query, the parser knows you are querying for 'Collection' and can determine that 'Collection' has a low count and an index is probably worth using. But in the bind variable case, the parser doesn't know which media_type you are using. Some other media_type value might account for a high proportion of the records in the table (say 20%). In that case, it would be faster to do a scan than to even bother with using the index. The parser needs to make a decision, and it happens to decide not to use the index (wrong for your case, but probably correct for the other media_type). This is just a guess based on how some other rdbms work.

In this case, the answer is to hard code if you know the selectivity attributes are very skewed, use dynamic sql to force late parsing, or force index usage if you think that is right.

like image 39
Glenn Avatar answered Sep 30 '22 15:09

Glenn