My queries get very slow when I add a limit 1
.
I have a table object_values
with timestamped values for objects:
timestamp | objectID | value -------------------------------- 2014-01-27| 234 | ksghdf
Per object I want to get the latest value:
SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC LIMIT 1;
(I cancelled the query after more than 10 minutes)
This query is very slow when there are no values for a given objectID (it is fast if there are results). If I remove the limit it tells me nearly instantaneous that there are no results:
SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC; ... Time: 0.463 ms
An explain shows me that the query without limit uses the index, where as the query with limit 1
does not make use of the index:
Slow query:
explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 1; QUERY PLAN` ---------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..2350.44 rows=1 width=126) -> Index Scan Backward using object_values_timestamp on object_values (cost=0.00..3995743.59 rows=1700 width=126) Filter: (objectID = 53708)`
Fast query:
explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Sort (cost=6540.86..6545.11 rows=1700 width=126) Sort Key: timestamp -> Index Scan using object_values_objectID on working_hours_t (cost=0.00..6449.65 rows=1700 width=126) Index Cond: (objectID = 53708)
The table contains 44,884,559 rows and 66,762 distinct objectIDs.
I have separate indexes on both fields: timestamp
and objectID
.
I have done a vacuum analyze
on the table and I have reindexed the table.
Additionally the slow query becomes fast when I set the limit to 3 or higher:
explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 3; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Limit (cost=6471.62..6471.63 rows=3 width=126) -> Sort (cost=6471.62..6475.87 rows=1700 width=126) Sort Key: timestamp -> Index Scan using object_values_objectID on object_values (cost=0.00..6449.65 rows=1700 width=126) Index Cond: (objectID = 53708)
In general I assume it has to do with the planner making wrong assumptions about the exectution costs and therefore chooses for a slower execution plan.
Is this the real reason? Is there a solution for this?
PostgreSQL attempts to do a lot of its work in memory, and spread out writing to disk to minimize bottlenecks, but on an overloaded system with heavy writing, it's easily possible to see heavy reads and writes cause the whole system to slow as it catches up on the demands.
The PostgreSQL LIMIT clause is used to get a subset of rows generated by a query. It is an optional clause of the SELECT statement. The LIMIT clause can be used with the OFFSET clause to skip a specific number of rows before returning the query for the LIMIT clause.
In terms of business transactions, each business transactions is around 30-35 queries hitting the database. We are able to achieve ~ 150 business transactions with 4,500-5,000 QPS ( query per second ).
A more traditional way to attack slow queries is to make use of PostgreSQL’s slow query log. The idea is: If a query takes longer than a certain amount of time, a line will be sent to the log. This way slow queries can easily be spotted so that developers and administrators can quickly react and know where to look.
Here are my top three suggestions to handle bad performance: Each method has its own advantages and disadvantages, which will be discussed in this document A more traditional way to attack slow queries is to make use of PostgreSQL’s slow query log. The idea is: If a query takes longer than a certain amount of time, a line will be sent to the log.
The idea is: If a query takes longer than a certain amount of time, a line will be sent to the log. This way slow queries can easily be spotted so that developers and administrators can quickly react and know where to look. In a default configuration the slow query log is not active.
Finding a query, which takes too long for whatever reason is exactly when one can make use of auto_explain. Here is the idea: If a query exceeds a certain threshold, PostgreSQL can send the plan to the logfile for later inspection. The LOAD command will load the auto_explain module into a database connection.
You can avoid this issue by adding an unneeded ORDER BY
clause to the query.
SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp, objectID DESC limit 1;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With