Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TimescaleDB: efficiently select last row

I have a postgres database with the timescaledb extension.

My primary index is a timestamp, and I would like to select the latest row.

If I happen to know the latest row happened after a certain time, then I can use a query such as:

query = 'select * from prices where time > %(dt)s'

Here I specify a datetime, and execute the query using psycopg2:

# 2018-01-10 11:15:00
dt = datetime.datetime(2018,1,10,11,15,0)

with psycopg2.connect(**params) as conn:
    cur = conn.cursor()
    # start timing
    beg = datetime.datetime.now()
    # execute query
    cur.execute(query, {'dt':dt})
    rows = cur.fetchall()
    # stop timing
    end = datetime.datetime.now()

print('took {} ms'.format((end-beg).total_seconds() * 1e3))

The timing output:

took 2.296 ms

If, however, I don't know the time to input into the above query, I can use a query such as:

query = 'select * from prices order by time desc limit 1'

I execute the query in a similar fashion

with psycopg2.connect(**params) as conn:
    cur = conn.cursor()
    # start timing
    beg = datetime.datetime.now()
    # execute query
    cur.execute(query)
    rows = cur.fetchall()
    # stop timing
    end = datetime.datetime.now()

print('took {} ms'.format((end-beg).total_seconds() * 1e3))

The timing output:

took 19.173 ms

So that's more than 8 times slower.

I'm no expert in SQL, but I would have thought the query planner would figure out that "limit 1" and "order by primary index" equates to an O(1) operation.

Question:

Is there a more efficient way to select the last row in my table?

In case it is useful, here is the description of my table:

# \d+ prices

                                           Table "public.prices"
 Column |            Type             | Collation | Nullable | Default | Storage | Stats target | Description 
--------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
 time   | timestamp without time zone |           | not null |         | plain   |              | 
 AAPL   | double precision            |           |          |         | plain   |              | 
 GOOG   | double precision            |           |          |         | plain   |              | 
 MSFT   | double precision            |           |          |         | plain   |              | 
Indexes:
    "prices_time_idx" btree ("time" DESC)
Child tables: _timescaledb_internal._hyper_12_100_chunk,
              _timescaledb_internal._hyper_12_101_chunk,
              _timescaledb_internal._hyper_12_102_chunk,
              ...
like image 644
user123456789 Avatar asked Jul 28 '18 20:07

user123456789


People also ask

How do I get the latest inserted record in PostgreSQL?

The typical way of doing this is to check that no row has a higher timestamp than any row we retrieve. SELECT timestamp, value, card FROM my_table t1 WHERE NOT EXISTS ( SELECT * FROM my_table t2 WHERE t2. timestamp > t1. timestamp );

How do I get the latest record using timestamp in SQL?

To get the last updated record in SQL Server: We can write trigger (which automatically fires) i.e. whenever there is a change (update) that occurs on a row, the “lastupdatedby” column value should get updated by the current timestamp.

How do I select the second last row in SQL?

Here is the query to get the second last row of a table in MySQL. mysql> select *from secondLastDemo order by StudentId DESC LIMIT 1,1; The output displays the second last record.


2 Answers

An efficient way to get last / first record in TimescaleDB:

First record:

SELECT <COLUMN>, time FROM <TABLE_NAME> ORDER BY time ASC LIMIT 1 ;

Last record:

SELECT <COLUMN>, time FROM <TABLE_NAME> ORDER BY time DESC LIMIT 1 ;

The question has already answered but I believe it might be useful if people will get here. Using first() and last() in TimescaleDB takes much longer.

like image 133
DarkDiamonD Avatar answered Oct 08 '22 04:10

DarkDiamonD


Your first query can exclude all but the last chunk, while your second query has to look in every chunk since there is no information to help the planner exclude chunks. So its not an O(1) operation but an O(n) operation with n being the number of chunks for that hypertable.

You could give that information to the planner by writing your query in the following form:

select * from prices WHERE time > now() - interval '1day' order by time desc limit 1

You might have to choose a different interval depending on your chunk time interval.

Starting with TimescaleDB 1.2 this is an O(1) operation if an entry can be found in the most recent chunk and the explicit time constraint in the WHERE clause is no longer needed if you order by time and have a LIMIT.

like image 4
Sven Klemm Avatar answered Oct 08 '22 03:10

Sven Klemm