I have a table that has a datetime field "updated_at". A lot of my queries will be querying on this field using range queries such as rows that have updated_at > a certain date.
I already added an index to updated_at, but most of my queries are still very slow, even when I had a limit to the number of rows return.
What else can I do to optimize queries that query on datetime fields?
Some of the tricks we used to speed up SELECT-s in PostgreSQL: LEFT JOIN with redundant conditions, VALUES, extended statistics, primary key type conversion, CLUSTER, pg_hint_plan + bonus. Photo by Richard Jacobs on Unsplash.
Just like any advanced relational database, PostgreSQL uses a cost-based query optimizer that tries to turn your SQL queries into something efficient that executes in as little time as possible.
Whitley's Genitor algorithm. The GEQO module allows the PostgreSQL query optimizer to support large join queries effectively through non-exhaustive search.
B-Tree Structure. PostgreSQL B-Tree indexes are multi-level tree structures, where each level of the tree can be used as a doubly-linked list of pages. A single metapage is stored in a fixed position at the start of the first segment file of the index. All other pages are either leaf pages or internal pages.
Usually database optimizers won't chose to use indexes for open-ended ranges, such a updated_at > somedate
.
But, in many cases the datatime column won't exceed "now", so you can preserve the semantic of > somedate
by converting the condition to a range by using between
like this:
where updated_at between somedate and current_timestamp
A between
predicate is much more likely to cause the optimizer to chose to use an index.
Please post if this approach improved your query’s performance.
For any given query, the use of an index depends on the cost of using that index compared to a sequential scan
Frequently developers think that because there is an index, a query should run faster, and if a query runs slow, an index is the solution. This is usually the case when the query will return few tuples. But as the number of tuples in the result increases, the cost of using an index might increase.
You are using postgres. Postgres does not support clustering around a given attribute. That means that postgres, when confronted with a range query (of the type att > a and att < b) needs to compute an estimation of the number of tuples in the result (make sure you vacuum your database frequently) and the cost of using an index compared to doing a sequential scan. it will then decide what method to use.
you can inspect this decision by running
EXPLAIN ANALYZE <query>;
in psql. It will tell you if it uses an index or not.
If you really, really want to use the indexes instead of a sequential scan (sometimes it is needed) and you really really know what you are doing, you can change the cost of a sequential scan in the planner constants or disable sequential scans in favor of any other method. See this page for the details:
http://www.postgresql.org/docs/9.1/static/runtime-config-query.html
Make sure you browse the correct version of the documentation.
--dmg
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With