I have a postgres database with the timescaledb extension. My primary index is a timestamp, and I would like to select the latest row. If I happen to know the latest row happened after a certain time, then I can use a query such as: <pre class="prettyprint lang-py prettyprint-override"><code>query = 'select * from prices where time > %(dt)s' </code></pre> Here I specify a datetime, and execute the query using psycopg2: <pre class="prettyprint lang-py prettyprint-override"><code># 2018-01-10 11:15:00 dt = datetime.datetime(2018,1,10,11,15,0) with psycopg2.connect(**params) as conn: cur = conn.cursor() # start timing beg = datetime.datetime.now() # execute query cur.execute(query, {'dt':dt}) rows = cur.fetchall() # stop timing end = datetime.datetime.now() print('took {} ms'.format((end-beg).total_seconds() * 1e3)) </code></pre> The timing output: <pre class="prettyprint"><code>took 2.296 ms </code></pre> If, however, I don't know the time to input into the above query, I can use a query such as: <pre class="prettyprint lang-py prettyprint-override"><code>query = 'select * from prices order by time desc limit 1' </code></pre> I execute the query in a similar fashion <pre class="prettyprint lang-py prettyprint-override"><code>with psycopg2.connect(**params) as conn: cur = conn.cursor() # start timing beg = datetime.datetime.now() # execute query cur.execute(query) rows = cur.fetchall() # stop timing end = datetime.datetime.now() print('took {} ms'.format((end-beg).total_seconds() * 1e3)) </code></pre> The timing output: <pre class="prettyprint"><code>took 19.173 ms </code></pre> So that's more than 8 times slower. I'm no expert in SQL, but I would have thought the query planner would figure out that "limit 1" and "order by primary index" equates to an O(1) operation. Question: Is there a more efficient way to select the last row in my table? In case it is useful, here is the description of my table: <pre class="prettyprint lang-none prettyprint-override"><code># \d+ prices Table "public.prices" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+-----------------------------+-----------+----------+---------+---------+--------------+------------- time | timestamp without time zone | | not null | | plain | | AAPL | double precision | | | | plain | | GOOG | double precision | | | | plain | | MSFT | double precision | | | | plain | | Indexes: "prices_time_idx" btree ("time" DESC) Child tables: _timescaledb_internal._hyper_12_100_chunk, _timescaledb_internal._hyper_12_101_chunk, _timescaledb_internal._hyper_12_102_chunk, ... </code></pre>

An efficient way to get last / first record in TimescaleDB: First record: <pre class="prettyprint"><code>SELECT <COLUMN>, time FROM <TABLE_NAME> ORDER BY time ASC LIMIT 1 ; </code></pre> Last record: <pre class="prettyprint"><code>SELECT <COLUMN>, time FROM <TABLE_NAME> ORDER BY time DESC LIMIT 1 ; </code></pre> The question has already answered but I believe it might be useful if people will get here. Using first() and last() in TimescaleDB takes much longer.

Your first query can exclude all but the last chunk, while your second query has to look in every chunk since there is no information to help the planner exclude chunks. So its not an O(1) operation but an O(n) operation with n being the number of chunks for that hypertable. You could give that information to the planner by writing your query in the following form: <pre class="prettyprint"><code>select * from prices WHERE time > now() - interval '1day' order by time desc limit 1 </code></pre> You might have to choose a different interval depending on your chunk time interval. Starting with TimescaleDB 1.2 this is an O(1) operation if an entry can be found in the most recent chunk and the explicit time constraint in the WHERE clause is no longer needed if you order by time and have a LIMIT.

TimescaleDB: efficiently select last row

Tags:

sql

postgresql

psycopg2

timescaledb

I have a postgres database with the timescaledb extension.

My primary index is a timestamp, and I would like to select the latest row.

If I happen to know the latest row happened after a certain time, then I can use a query such as:

query = 'select * from prices where time > %(dt)s'

Here I specify a datetime, and execute the query using psycopg2:

# 2018-01-10 11:15:00
dt = datetime.datetime(2018,1,10,11,15,0)

with psycopg2.connect(**params) as conn:
    cur = conn.cursor()
    # start timing
    beg = datetime.datetime.now()
    # execute query
    cur.execute(query, {'dt':dt})
    rows = cur.fetchall()
    # stop timing
    end = datetime.datetime.now()

print('took {} ms'.format((end-beg).total_seconds() * 1e3))

The timing output:

took 2.296 ms

If, however, I don't know the time to input into the above query, I can use a query such as:

query = 'select * from prices order by time desc limit 1'

I execute the query in a similar fashion

with psycopg2.connect(**params) as conn:
    cur = conn.cursor()
    # start timing
    beg = datetime.datetime.now()
    # execute query
    cur.execute(query)
    rows = cur.fetchall()
    # stop timing
    end = datetime.datetime.now()

print('took {} ms'.format((end-beg).total_seconds() * 1e3))

The timing output:

took 19.173 ms

So that's more than 8 times slower.

I'm no expert in SQL, but I would have thought the query planner would figure out that "limit 1" and "order by primary index" equates to an O(1) operation.

Question:

Is there a more efficient way to select the last row in my table?

In case it is useful, here is the description of my table:

# \d+ prices

                                           Table "public.prices"
 Column |            Type             | Collation | Nullable | Default | Storage | Stats target | Description 
--------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
 time   | timestamp without time zone |           | not null |         | plain   |              | 
 AAPL   | double precision            |           |          |         | plain   |              | 
 GOOG   | double precision            |           |          |         | plain   |              | 
 MSFT   | double precision            |           |          |         | plain   |              | 
Indexes:
    "prices_time_idx" btree ("time" DESC)
Child tables: _timescaledb_internal._hyper_12_100_chunk,
              _timescaledb_internal._hyper_12_101_chunk,
              _timescaledb_internal._hyper_12_102_chunk,
              ...

644

asked Jul 28 '18 20:07

user123456789

2 Answers

An efficient way to get last / first record in TimescaleDB:

First record:

SELECT <COLUMN>, time FROM <TABLE_NAME> ORDER BY time ASC LIMIT 1 ;

Last record:

SELECT <COLUMN>, time FROM <TABLE_NAME> ORDER BY time DESC LIMIT 1 ;

The question has already answered but I believe it might be useful if people will get here. Using first() and last() in TimescaleDB takes much longer.

133

answered Oct 08 '22 04:10

DarkDiamonD

Your first query can exclude all but the last chunk, while your second query has to look in every chunk since there is no information to help the planner exclude chunks. So its not an O(1) operation but an O(n) operation with n being the number of chunks for that hypertable.

You could give that information to the planner by writing your query in the following form:

select * from prices WHERE time > now() - interval '1day' order by time desc limit 1

You might have to choose a different interval depending on your chunk time interval.

Starting with TimescaleDB 1.2 this is an O(1) operation if an entry can be found in the most recent chunk and the explicit time constraint in the WHERE clause is no longer needed if you order by time and have a LIMIT.

answered Oct 08 '22 03:10

Sven Klemm

Related questions
                            
                                INSERT in a ONE to ONE Relationship
                            
                                Calculate Price For Overlapping Date Range
                            
                                SQL Server 2012 ISDATE() [duplicate]
                            
                                Analytic count over partition with and without ORDER BY clause
                            
                                Tools to work with stored procedures in Oracle, in a team?
                            
                                Implementing Wilson Score in SQL
                            
                                SQL-Query: EXISTS in Subtable
                            
                                SQL interview question
                            
                                Is the GROUP BY clause in SQL redundant?
                            
                                Query an XML file containing nested elements using LINQPad?
                            
                                SQL Server 2008 change data capture vs triggers in audit trail
                            
                                Comma separated values in one column - SQL SERVER
                            
                                Join multiple tables, select counts from different tables and group by one column in one query
                            
                                Sum for multiple date ranges in a single query?
                            
                                C# method to lock SQL Server table
                            
                                CASCADE DELETE on two foreign key constraints
                            
                                ER Diagram - Showing Deliveries to Office and to its Branches
                            
                                How to use a variable in Openrowset command
                            
                                Best practices to enable/disable/delete database rows and its references?
                            
                                SELECT DISTINCT HAVING Count unique conditions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With