Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres: Optimizing querying by datetime

I have a table that has a datetime field "updated_at". A lot of my queries will be querying on this field using range queries such as rows that have updated_at > a certain date.

I already added an index to updated_at, but most of my queries are still very slow, even when I had a limit to the number of rows return.

What else can I do to optimize queries that query on datetime fields?

like image 204
Henley Avatar asked May 19 '13 21:05

Henley


People also ask

How make PostgreSQL query run faster?

Some of the tricks we used to speed up SELECT-s in PostgreSQL: LEFT JOIN with redundant conditions, VALUES, extended statistics, primary key type conversion, CLUSTER, pg_hint_plan + bonus. Photo by Richard Jacobs on Unsplash.

Does Postgres optimize queries?

Just like any advanced relational database, PostgreSQL uses a cost-based query optimizer that tries to turn your SQL queries into something efficient that executes in as little time as possible.

What type of optimization algorithm is PostgreSQL using?

Whitley's Genitor algorithm. The GEQO module allows the PostgreSQL query optimizer to support large join queries effectively through non-exhaustive search.

What is Btree in PostgreSQL?

B-Tree Structure. PostgreSQL B-Tree indexes are multi-level tree structures, where each level of the tree can be used as a doubly-linked list of pages. A single metapage is stored in a fixed position at the start of the first segment file of the index. All other pages are either leaf pages or internal pages.


2 Answers

Usually database optimizers won't chose to use indexes for open-ended ranges, such a updated_at > somedate.

But, in many cases the datatime column won't exceed "now", so you can preserve the semantic of > somedate by converting the condition to a range by using between like this:

where updated_at between somedate and current_timestamp

A between predicate is much more likely to cause the optimizer to chose to use an index.


Please post if this approach improved your query’s performance.

like image 139
Bohemian Avatar answered Sep 16 '22 14:09

Bohemian


For any given query, the use of an index depends on the cost of using that index compared to a sequential scan

Frequently developers think that because there is an index, a query should run faster, and if a query runs slow, an index is the solution. This is usually the case when the query will return few tuples. But as the number of tuples in the result increases, the cost of using an index might increase.

You are using postgres. Postgres does not support clustering around a given attribute. That means that postgres, when confronted with a range query (of the type att > a and att < b) needs to compute an estimation of the number of tuples in the result (make sure you vacuum your database frequently) and the cost of using an index compared to doing a sequential scan. it will then decide what method to use.

you can inspect this decision by running

EXPLAIN ANALYZE <query>; 

in psql. It will tell you if it uses an index or not.

If you really, really want to use the indexes instead of a sequential scan (sometimes it is needed) and you really really know what you are doing, you can change the cost of a sequential scan in the planner constants or disable sequential scans in favor of any other method. See this page for the details:

http://www.postgresql.org/docs/9.1/static/runtime-config-query.html

Make sure you browse the correct version of the documentation.

--dmg

like image 34
dmg Avatar answered Sep 17 '22 14:09

dmg