Optimize Postgres timestamp query range

Tags:

I have the following table and indices defined:

CREATE TABLE ticket (
  wid bigint NOT NULL DEFAULT nextval('tickets_id_seq'::regclass),
  eid bigint,
  created timestamp with time zone NOT NULL DEFAULT now(),
  status integer NOT NULL DEFAULT 0,
  argsxml text,
  moduleid character varying(255),
  source_id bigint,
  file_type_id bigint,
  file_name character varying(255),
  status_reason character varying(255),
  ...
)

I created an index on the created timestamp as follows:

CREATE INDEX ticket_1_idx
  ON ticket
  USING btree
  (created );

Here's my query:

select * from ticket 
where created between '2012-12-19 00:00:00' and  '2012-12-20 00:00:00'

This was working fine until the number of records started to grow (about 5 million) and now it's taking forever to return.

Explain analyze reveals this:

Index Scan using ticket_1_idx on ticket  (cost=0.00..10202.64 rows=52543 width=1297) (actual time=0.109..125.704 rows=53340 loops=1)
  Index Cond: ((created >= '2012-12-19 00:00:00+00'::timestamp with time zone) AND (created <= '2012-12-20 00:00:00+00'::timestamp with time zone))
Total runtime: 175.853 ms

So far I've tried setting:

random_page_cost = 1.75 
effective_cache_size = 3

Also created:

create CLUSTER ticket USING ticket_1_idx;

Nothing works. What am I doing wrong? Why is it selecting sequential scan? The indexes are supposed to make the query fast. Anything that can be done to optimize it?

653

asked Dec 21 '12 22:12

user1754724

1 Answers

`CLUSTER`

If you intend to use CLUSTER, the displayed syntax is invalid.

~~create CLUSTER ticket USING ticket_1_idx;~~

Run once:

CLUSTER ticket USING ticket_1_idx;

This can help a lot with bigger result sets. Not so much for a single or few rows returned.
Postgres remembers which index to use for subsequent calls. If your table isn't read-only the effect deteriorates over time and you need to re-run at certain intervals:

CLUSTER ticket;

Possibly only on volatile partitions. See below.

However, if you have lots of updates, CLUSTER (or VACUUM FULL) may actually be bad for performance. The right amount of bloat allows UPDATE to place new row versions on the same data page and avoids the need for physically extending the underlying physical file too often. You can use a carefully tuned FILLFACTOR to get the best of both worlds:

Fill factor for a sequential index that is PK

`pg_repack` / `pg_squeeze`

CLUSTER takes an exclusive lock on the table, which may be a problem in a multi-user environment. Quoting the manual:

When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired on it. This prevents any other database operations (both reads and writes) from operating on the table until the CLUSTER is finished.

Bold emphasis mine. Consider the alternatives!

pg_repack:

Unlike CLUSTER and VACUUM FULL it works online, without holding an exclusive lock on the processed tables during processing. pg_repack is efficient to boot, with performance comparable to using CLUSTER directly.

and:

pg_repack needs to take an exclusive lock at the end of the reorganization.

The current version 1.4.7 works with PostgreSQL 9.4 - 14.

pg_squeeze is a newer alternative that claims:

In fact we try to replace pg_repack extension.

The current version 1.4 works with Postgres 10 - 14.

Query

The query is simple enough not to cause any performance problems per se.

However, a word on correctness: The BETWEEN construct includes boundaries. Your query selects all of Dec. 19, plus records from Dec. 20, 00:00 hours. That's an extremely unlikely requirement. Chances are, you really want:

SELECT *
FROM   ticket 
WHERE  created >= '2012-12-19 0:0'
AND    created <  '2012-12-20 0:0';

Performance

First off, you ask:

Why is it selecting sequential scan?

Your EXPLAIN output clearly shows an Index Scan, not a sequential table scan. There must be some kind of misunderstanding.

You may be able to improve performance, but the necessary background information is not in the question. Possible options include:

Only query required columns instead of * to reduce transfer cost (and other performance benefits).
Look at partitioning and put practical time slices into separate tables. Add indexes to partitions as needed.
If partitioning is not an option, another related but less intrusive technique would be to add one or more partial indexes.
For example, if you mostly query the current month, you could create the following partial index:
```
  CREATE INDEX ticket_created_idx ON ticket(created)
  WHERE created >= '2012-12-01 00:00:00'::timestamp;
```
CREATE a new index right before the start of a new month. You can easily automate the task with a cron job. Optionally DROP partial indexes for old months later.

Keep the total index in addition for CLUSTER (which cannot operate on partial indexes). If old records never change, table partitioning would help this task a lot, since you only need to re-cluster newer partitions.
Then again if records never change at all, you probably don't need CLUSTER.

Performance Basics

You may be missing one of the basics. All the usual performance advice applies:

https://wiki.postgresql.org/wiki/Slow_Query_Questions
https://wiki.postgresql.org/wiki/Performance_Optimization

147

answered Sep 30 '22 18:09

Erwin Brandstetter

Related questions
                            
                                how to determine max_client_conn for pgbouncer
                            
                                Create primary key on materialized view in Postgres
                            
                                unrecognized configuration parameter "default table access method" google cloud
                            
                                Inserting NEW.* from a generic trigger using EXECUTE in PL/pgsql
                            
                                Adding a non-nullable column to existing table fails. Is the "value" attribute being ignored?
                            
                                Function to loop through and select data from multiple tables
                            
                                How to get a version of pgAdmin III working correctly with the PostgreSQL 9.4 (Ubuntu 14.10 x64)? [closed]
                            
                                Postgres CSV COPY from/import is not respecting CSV headers
                            
                                How annotate the Max value of two fields in a Django QuerySet
                            
                                Performance Tuning PostgreSQL
                            
                                How to create foreign key only if it doesn't exists already?
                            
                                Handling race conditions in PostgreSQL
                            
                                Return dynamic table with unknown columns from PL/pgSQL function
                            
                                Weird Bytes Added to Attribute After Save in Rails
                            
                                PostgreSQL statistics issue - could not rename temporary statistics file
                            
                                How do I change the default client_encoding in Postgres?
                            
                                How to create a copy of table in PostgreSQL?
                            
                                PostgreSQL ODBC driver not showing up in Control Panel (Data Sources)
                            
                                PGError: ERROR: source database "template1" is being accessed by other users
                            
                                PostgreSQL update time zone offset

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimize Postgres timestamp query range

Tags:

indexing

postgresql

query-optimization

database-partitioning

postgresql-performance

user1754724

People also ask

1 Answers

`CLUSTER`

`pg_repack` / `pg_squeeze`

Query

Performance

Performance Basics

Erwin Brandstetter

Recent Activity

Donate For Us

Optimize Postgres timestamp query range

Tags:

indexing

postgresql

query-optimization

database-partitioning

postgresql-performance

user1754724

People also ask

1 Answers

CLUSTER

pg_repack / pg_squeeze

Query

Performance

Performance Basics

Erwin Brandstetter

Related questions

Recent Activity

Donate For Us

`CLUSTER`

`pg_repack` / `pg_squeeze`