I am running into this Postgres issue where the same query takes a long time to execute if I use a parameter vs hardcoding its value on the query string. The column name is 'media_type' and it's a VARCHAR(20). I am running these queries from PHP, using Symfony2 and Doctrine2 ORM and the table in question has about 1.000.000 records.
Do I have a problem on my query? Can it be a Postgres configuration problem?
1 - Hardcoded value for media_type
duration: 5.365 ms parse pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = 'Collection' AND enabled = 'true' AND site_id = $1 AND user_id = $2 ORDER BY id DESC LIMIT $3 OFFSET $4
duration: 0.142 ms bind pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = 'Collection' AND enabled = 'true' AND site_id = $1 AND user_id = $2 ORDER BY id DESC LIMIT $3 OFFSET $4
parameters: $1 = '1', $2 = '1', $3 = '100', $4 = '0'
duration: 8.667 ms execute pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = 'Collection' AND enabled = 'true' AND site_id = $1 AND user_id = $2 ORDER BY id DESC LIMIT $3 OFFSET $4
parameters: $1 = '1', $2 = '1', $3 = '100', $4 = '0'
Execution plan:
duration: 8.640 ms plan:
Query Text: SELECT id,site_id FROM item where media_type = 'Collection' AND enabled = 'true' AND site_id = $1 AND user_id = $2 ORDER BY id DESC LIMIT $3 OFFSET $4
Limit (cost=8.38..8.38 rows=1 width=12) (actual time=8.516..8.595 rows=24 loops=1)
Buffers: shared hit=10 read=15
-> Sort (cost=8.38..8.38 rows=1 width=12) (actual time=8.505..8.530 rows=24 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=10 read=15
-> Index Scan using item_media_type_index on item (cost=0.00..8.37 rows=1 width=12) (actual time=7.955..8.397 rows=24 loops=1)
Index Cond: ((media_type)::text = 'Collection'::text)
Filter: (enabled AND (site_id = $1) AND (user_id = $2))
Buffers: shared hit=8 read=15
2 - Using a parameter for media_type (SLOWER)
duration: 5.557 ms parse pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = $1 AND enabled = 'true' AND site_id = $2 AND user_id = $3 ORDER BY id DESC LIMIT $4 OFFSET $5
duration: 1.322 ms bind pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = $1 AND enabled = 'true' AND site_id = $2 AND user_id = $3 ORDER BY id DESC LIMIT $4 OFFSET $5
parameters: $1 = 'Collection', $2 = '1', $3 = '1', $4 = '100', $5 = '0'
duration: 71564.998 ms execute pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = $1 AND enabled = 'true' AND site_id = $2 AND user_id = $3 ORDER BY id DESC LIMIT $4 OFFSET $5
parameters: $1 = 'Collection', $2 = '1', $3 = '1', $4 = '100', $5 = '0'
Execution plan:
duration: 71564.922 ms plan:
Query Text: SELECT id,site_id FROM item where media_type = $1 AND enabled = 'true' AND site_id = $2 AND user_id = $3 ORDER BY id DESC LIMIT $4 OFFSET $5
Limit (cost=90663.16..181326.31 rows=17184 width=12) (actual time=3.667..71564.864 rows=24 loops=1)
Buffers: shared hit=183786 read=96585
-> Index Scan Backward using item_pkey on item (cost=0.00..906610.46 rows=171836 width=12) (actual time=3.655..71564.798 rows=24 loops=1)
Filter: (enabled AND ((media_type)::text = $1) AND (site_id = $2) AND (user_id = $3))
Buffers: shared hit=183786 read=96585
Thanks in advance.
This is a bit of a long-standing wart in PostgreSQL that's historically required some interesting planner tuning to work around. It's fixed in PostgreSQL 9.2 (in beta now) though thanks, as usual, to Tom Lane.
E.1.3.1.3. Optimizer
Improve the ability of the planner to choose parameterized plans (Tom Lane)
A prepared statement is now parsed, analyzed, and rewritten, but not necessarily planned. When the prepared plan is executed with parameters, the planner might replan it for every constant, or it might execute a generic plan if its cost is close to that of a constant-specific plan.
See the 9.2 beta release notes and a quick note I wrote about this on lwn.net. There's lots of info about handling prepared / parameterised statements running slower than normal ones on the mailing lists.
I experienced a very similar issue once when I was binding to a SMALLINT field and passing in a value that Postgres was implicitly casting from INTEGER to SMALLINT. I fixed it by making the cast explicit. Since media_type is type VARCHAR(20), Postgres is doing an implicit cast from type TEXT. Try this:
where media_type = $1::VARCHAR(20)
On your static, faster query, the item_media_type_index is being used. On your bound, slower query, the item_media_type_index is not being used.
What is the selectivity of the "media_type" column in the "item" table? If you did a:
SELECT media_type, COUNT(*)
FROM item
GROUP BY media_type
ORDER BY 2 desc
Are the media_types evenly balanced or are there relatively few 'Collection' media_types compared to the others? If there are relatively few 'Collection' items then I would hazard this guess: with the static query, the parser knows you are querying for 'Collection' and can determine that 'Collection' has a low count and an index is probably worth using. But in the bind variable case, the parser doesn't know which media_type you are using. Some other media_type value might account for a high proportion of the records in the table (say 20%). In that case, it would be faster to do a scan than to even bother with using the index. The parser needs to make a decision, and it happens to decide not to use the index (wrong for your case, but probably correct for the other media_type). This is just a guess based on how some other rdbms work.
In this case, the answer is to hard code if you know the selectivity attributes are very skewed, use dynamic sql to force late parsing, or force index usage if you think that is right.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With