Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

postgres: Index on a timestamp field

I'm new to postgres and I have a question about the timestamp type.

To set the scene, I have a table like the one below:

CREATE TABLE IF NOT EXISTS tbl_example (
    example_id bigint not null,
    example_name text,
    example_timestamp timestamp,
    primary key (example_id)
);

Now I want to run a query to find me the list of examples based on a specific date, using the timestamp.

For example, the common query that will always be run is:

select example_id, example_name, example_timestamp where example_timestamp = date_trunc('datepart', example_timestamp) order by example_timestamp desc;

However, to speed up the search process I was thinking of adding an index to the example_timestamp field:

CREATE INDEX idx_example_timestamp on tbl_example(example_timestamp);

My question, is how does postgres perform the index on the timestamp - in other words will it index the timestamp based on the date/ time, or will it go into the seconds/ milliseconds, etc?

Alternatively I was thinking of creating a new column with 'example_date' and indexing on this column instead to simplify things. I wasn't keen on having both a date and a timestamp field as I could get the date from the timestamp field, but for index purposes i thought maybe it might be best to create a separate field.

If anyone has any thoughts on this that would be appreciated?

thanks.

like image 668
rm12345 Avatar asked Dec 13 '19 21:12

rm12345


3 Answers

Don’t worry, be happy

how does postgres perform the index on the timestamp - in other words will it index the timestamp based on the date/ time, or will it go into the seconds/ milliseconds, etc?

The internals of the indexing scheme used by Postgres should generally be transparent to you, of no regard. And keep in mind that the implementation you study today may change in a future version of Postgres.

You are likely falling into the trap of premature optimization. Trust Postgres and its default behaviors until you know you have a demonstrable performance problem.

Moments

Date-time handling is more complicated than you might understand.

Firstly, you are using TIMESTAMP which is actually the abbreviated name for TIMESTAMP WITHOUT TIME ZONE. This type cannot represent a moment. This type stores only a date and a time-of-day. For example, January 23 2020 at 12:00 noon. But does that mean noon in Tokyo Japan? Or noon in Paris France, several hours later? Or noon in Toledo Ohio US, several more hours later?

I suggest always expanding the type name out fully to be very clear in your SQL. Use TIMESTAMP WITHOUT TIME ZONE rather than TIMESTAMP.

But if you are actually trying to represent moments, a specific point on the timeline, you must use the TIMESTAMP WITH TIME ZONE. This name comes from the SQL Standard. But in Postgres, and some other databases, it is a bit of a misnomer. Postgres does not actually store the time zone. Instead, Postgres uses any time zone or offset-from-UTC information submitted along with an input to adjust into UTC. The value written to storage is always in UTC. If you care about the original zone name or offset numbers (hours-minutes-seconds), then you need to store that in a second column.

When retrieved from the database, the value comes out in UTC as well. But be aware that some middleware tools insist on applying a default time zone the value after retrieval. While well-intentioned, this anti-feature can cause much confusion. You will have no such confusion when using java.time objects as shown below.

Span-of-time queries

Postgres is storing a moment in UTC, likely as a count from an epoch-reference date-time given that the data type is documented as being an integer of 64 bits (8 octets). According to Wikipedia, Postgres uses an epoch reference of 2000-01-01, presumably the first moment of that date in UTC, 2000-01-01T00:00:00.0Z. We do not have any reason to care about what epoch references is used, but there you go.

The real point is that a date-time value in Postgres is stored simply as a number, a count of microseconds. The timestamp types are not a specific date and a time-of-day as you may be thinking. Your queries certainly may benefit from an index on the timestamp column, but date-oriented (without time-of-day) queries will not benefit specifically. The index is not date-oriented, nor can it be as I will explain next.

Determining a date from a moment requires a time zone. For any given moment, the date varies around the globe by time zone. A few minutes after midnight in Paris France is a new day while still “yesterday” in Montréal Québec.

To query moments by date, you need to determine the first moment of the day, and the first moment of the following day. Then we use the Half-Open approach to defining a span-of-time where the beginning is inclusive while the ending is exclusive. We search for moments that are equal to or later than the beginning while also being before the ending. Tip: another way of saying "equal to or later than the beginning" is "not before".

You are using Java, so you can make use of the industry-leading java.time classes there.

The java.time classes use a resolution of nanoseconds, finer than the microseconds used in Postgres. So you will have no problem loading Postgres values into Java. However, beware of data loss when going the other direction as nanoseconds will be silently truncated to store only microseconds.

When determining the first moment of the day, do not assume the day starts at 00:00:00.0. Some dates in some zones start at another time such as 01:00:00.0. Always let java.time determine the first moment of the day.

ZoneId z = ZoneId.of( "Asia/Tokyo" ) ;                          // Or `Africa/Tunis`, `America/Montreal`, etc.
LocalDate today = LocalDate.now( z ) ;
ZonedDateTime zdtStart = today.atStartOfDay( z ) ;              // First moment of the day.
ZonedDateTime zdtStop = today.plusDays( 1 ).atStartOfDay( z ) ; // First moment of the following day.

Write your Half-Open SQL statement. Do not use the SQL command BETWEEN because it is not Half-Open.

String sql = "SELECT * FROM tbl WHERE event !< ? && event < ? ;" ;  // Half-Open query in SQL.

Pass your beginning and ending values to a prepared statement.

Your JDBC driver supporting JDBC 4.2 and later can work with most java.time by using PreparedStatement::setObject & ResultSet::getObject. Oddly, the JDBC spec does not require support for the two most commonly used types: Instant (always in UTC) and ZonedDateTime. These may or may not work your specific driver. The standard does require support for OffsetDateTime, so let's convert to that.

preparedStatement.setObject( 1 , zdtStart.toOffsetDateTime() ) ;
preparedStatement.setObject( 2 , zdtStop.toOffsetDateTime() ) ;

The resulting OffsetDateTime objects passed to the PreparedStatement will carry the offset used by that time zone at that date-time. For debugging, or curiosity, you may want to see those values in UTC. So let's adjust to UTC by extracting a Instant, and then applying an offset of zero hours-minutes-seconds to get a OffsetDateTime carrying an offset of UTC itself.

OffsetDateTime start = zdtStart.toInstant().atOffset( ZoneOffset.UTC ) ;
OffsetDateTime stop = zdtStop.toInstant().atOffset( ZoneOffset.UTC ) ;

Pass to the prepared statement.

preparedStatement.setObject( 1 , start ) ;
preparedStatement.setObject( 2 , stop ) ;

Once these start and stop values arrive at the database server, they will be converted to a number representing a count-from-epoch, a simple integer. Then Postgres performs a simple number comparison. If an index exists on those integer numbers, that index may or may not be utilized as the Postgres query planner sees fit.

If you have relatively small number of rows, and much RAM to cache them, you may not need an index. Perform tests, and use EXPLAIN/ANALYZE to see the real-world performance.

Date column via Java

If you have done the work to prove a performance problem with date-oriented queries, you could add a second column of type DATE. Then index that column, and refer to it explicitly in your date-oriented queries.

When inserting your moment, also include a calculated value for the date as perceived in whatever time zone makes sense to your app. Just be sure to clearly document your intentions, and the specifics of the time zone used in determining the date. Tip: Postgres offers a feature to include a blurb of text as part of your column's definition alongside the column name and its data type.

As a second DATE column is derived from another column, it is by definition redundant and there de-normalized. As a rule, you should consider de-normalizing only as a last resort.

Java code when inserting a value.

String sql = "INSERT INTO tbl ( event , date_tokyo ) VALUES ( ? , ? ) ;" ;

Determine the current moment, and the current moment's date as perceived in time zone Asia/Tokyo.

Instant now = Instant.now() ;  // Always in UTC, no need to specify a time zone here.
OffsetDateTime odt = now.atOffset( ZoneOffset.UTC ) ;  // Convert from `Instant` to `OffsetDateTime` if your JDBC driver does not support `Instant`.
ZoneId z = ZoneId.of( "Asia/Tokyo" ) ;
ZonedDateTime zdt = now.atZone( z ) ;
LocalDate localDate = zdt.toLocalDate() ; // Extract the date as seen at this moment by people in the Tokyo time zone.

Pass to your prepared statement.

preparedStatement.setObject( 1 , odt ) ;
preparedStatement.setObject( 2 , localDate ) ;

Now you can make date-oriented queries on the date_tokyo column. Index if need be.

Date column via SQL

Alternatively, you could populate that date_tokyo column automatically within Postgres.

Trigger

You could write a trigger that uses date-time functions built into Postgres to determine the date of that moment as seen in the time zone Asia/Tokyo. The trigger could then write the resulting date value into that second column.

Generated value column

Or, with Postgres 12, you can more simply use the new generated columns feature. This new feature does the same work, but without the bother of defining and attaching a trigger. For discussion of this new feature, see:

  • New In PostgreSQL 12: Generated Columns
  • Generated columns in PostgreSQL 12 
by Kirk Roybal
  • PostgreSQL 12: generated columns 
by Daniel Westermann

In Postgres 12, a column with GENERATED ALWAYS AS (…) STORED has its value physically stored, and can be indexed.

Caveat

Crucial to such date-time work is correct information about the current definitions of time zones. Usually this information comes via the tz data maintained by ICANN/IANA.

Both Java and Postgres contain their own copy of tz data.

Politicians around the world have shown a penchant for redefining their time zones, often with little or no warning. So be sure to keep track of changes to time zones you care about. When you update Java or Postgres, you will likely be getting fresh copy of the tz data. But in some cases you may need to manually update either or both environments (Java & Postgres). And your host OS has a tz data copy as well, fyi.

like image 136
Basil Bourque Avatar answered Oct 17 '22 14:10

Basil Bourque


This is a regurgitation of what Percona recommends.

They recommend the

BRIN index

.

I needed this proof for picking up sets of records ordered by timestamptz. Even though the example uses timestamp I use timestamptz.

  1. And my records are chronological and old timestamptz columns are not updated or deleted.

  2. Other columns in only recent records are updated. Older records are untouched.

My table will have a few million records.

You could test your queries. I use pgAdmin.

CREATE TABLE testtab (id int NOT NULL PRIMARY KEY,date TIMESTAMP NOT NULL, level INTEGER, msg TEXT);

create index testtab_date_idx  on testtab(date);

"Gather  (cost=1000.00..133475.57 rows=1 width=49) (actual time=848.040..862.638 rows=0 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  ->  Parallel Seq Scan on testtab  (cost=0.00..132475.47 rows=1 width=49) (actual time=832.108..832.109 rows=0 loops=3)"
"        Filter: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))"
"        Rows Removed by Filter: 2666667"
"Planning Time: 0.238 ms"
"Execution Time: 862.662 ms"

explain analyze select * from public.testtab where date between '2019-08-08 14:40:47.974791' and '2019-08-08 14:50:47.974791';

"Gather  (cost=1000.00..133475.57 rows=1 width=49) (actual time=666.283..681.586 rows=0 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  ->  Parallel Seq Scan on testtab  (cost=0.00..132475.47 rows=1 width=49) (actual time=650.661..650.661 rows=0 loops=3)"
"        Filter: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))"
"        Rows Removed by Filter: 2666667"
"Planning Time: 0.069 ms"
"Execution Time: 681.617 ms"

create index testtab_date_brin_idx  on rm_owner.testtab using brin (date);

explain analyze select * from public.testtab where date between '2019-08-08 14:40:47.974791' and '2019-08-08 14:50:47.974791';

"Bitmap Heap Scan on testtab  (cost=20.03..33406.84 rows=1 width=49) (actual time=0.143..0.143 rows=0 loops=1)"
"  Recheck Cond: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))"
"  ->  Bitmap Index Scan on "testtab_date_brin_idx "  (cost=0.00..20.03 rows=12403 width=0) (actual time=0.141..0.141 rows=0 loops=1)"
"        Index Cond: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))"
"Planning Time: 0.126 ms"
"Execution Time: 0.161 ms"

Update : All the examples I see are like the one described here

like image 43
Mohan Radhakrishnan Avatar answered Oct 17 '22 14:10

Mohan Radhakrishnan


Do it!

Postgres default indexes are stored in a sorted b-tree.

Therefore - putting an index on the example_timestamp column would result in more efficient queries. Remember that indexed has their down side of a bit more heavy insert (need to balance the tree)

Good luck

For more info checkout this video https://youtu.be/clrtT_4WBAw

like image 1
idoshveki Avatar answered Oct 17 '22 12:10

idoshveki