First and last value of window function in one row in PostgreSQL

Tags:

window-functions

I'd like to have first value of one column and last value of second column in one row for a specified partition. For that I created this query:

SELECT DISTINCT
b.machine_id,
batch,
timestamp_sta,
timestamp_stp,
FIRST_VALUE(timestamp_sta) OVER w AS batch_start,
LAST_VALUE(timestamp_stp) OVER w AS batch_end
FROM db_data.sta_stp AS a
JOIN db_data.ll_lu AS b
ON a.ll_lu_id=b.id
WINDOW w AS (PARTITION BY batch, machine_id ORDER BY timestamp_sta)
ORDER BY timestamp_sta, batch, machine_id;

But as you can see in the image, returned data in batch_end column are not correct.

batch_start column has correct first value of timestamp_sta column. However batch_end should be "2012-09-17 10:49:45" and it equals timestamp_stp from same row.

Why is it so?

enter image description here

438

asked Jun 05 '17 12:06

3 Answers

The question is old, but this solution is simpler and faster than what's been posted so far:

SELECT b.machine_id
     , batch
     , timestamp_sta
     , timestamp_stp
     , min(timestamp_sta) OVER w AS batch_start
     , max(timestamp_stp) OVER w AS batch_end
FROM   db_data.sta_stp a
JOIN   db_data.ll_lu   b ON a.ll_lu_id = b.id
WINDOW w AS (PARTITION BY batch, b.machine_id) -- No ORDER BY !
ORDER  BY timestamp_sta, batch, machine_id; -- why this ORDER BY?

If you add ORDER BY to the window frame definition, each next row with a greater ORDER BY expression has a later frame start. Neither min() nor first_value() can return the "first" timestamp for the whole partition then. Without ORDER BY all rows of the same partition are peers and you get your desired result.

Your added ORDER BY works (not the one in the window frame definition, the outer one), but doesn't seem to make sense and makes the query more expensive. You should probably use an ORDER BY clause that agrees with your window frame definition to avoid additional sort cost:

... 
ORDER BY batch, b.machine_id, timestamp_sta, timestamp_stp;

I don't see the need for DISTINCT in this query. You could just add it if you actually need it. Or DISTINCT ON (). But then the ORDER BY clause becomes even more relevant. See:

Select first row in each GROUP BY group?

If you need some other column(s) from the same row (while still sorting by timestamps), your idea with FIRST_VALUE() and LAST_VALUE() might be the way to go. You'd probably need to append this to the window frame definition then:

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

See:

PostgreSQL query with max and min date plus associated id per row

answered Sep 27 '22 22:09

Erwin Brandstetter

The explanations given by @Łukasz Kamiński solve the core of the issue.

However, the last_value should be replaced by max(). You are sorting by timestamp_sta so the last value is the one having the greatest timestamp_sta, which may or may not be related to timestamp_stp. Also I would sort by the two fields.

SELECT DISTINCT
  b.machine_id,
  batch,
  timestamp_sta,
  timestamp_stp,
  FIRST_VALUE(timestamp_sta) OVER w AS batch_start,
  MAX(timestamp_stp) OVER w AS batch_end
FROM db_data.sta_stp AS a
JOIN db_data.ll_lu AS b
ON a.ll_lu_id=b.id
WINDOW w AS (PARTITION BY batch, machine_id 
             ORDER BY timestamp_sta,timestamp_stp 
             RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
ORDER BY timestamp_sta, batch, machine_id;

http://rextester.com/UTDE60342

answered Sep 28 '22 00:09

JGH

From syntax documentation:

The frame_clause specifies the set of rows constituting the window frame, which is a subset of the current partition, for those window functions that act on the frame instead of the whole partition. The frame can be specified in either RANGE or ROWS mode; in either case, it runs from the frame_start to the frame_end. If frame_end is omitted, it defaults to CURRENT ROW.

A frame_start of UNBOUNDED PRECEDING means that the frame starts with the first row of the partition, and similarly a frame_end of UNBOUNDED FOLLOWING means that the frame ends with the last row of the partition.

and function list

last_value(value any) returns value evaluated at the row that is the last row of the window frame

So correct SQL should be:

SELECT DISTINCT
b.machine_id,
batch,
timestamp_sta,
timestamp_stp,
FIRST_VALUE(timestamp_sta) OVER w AS batch_start,
LAST_VALUE(timestamp_stp) OVER w AS batch_end
FROM db_data.sta_stp AS a
JOIN db_data.ll_lu AS b
ON a.ll_lu_id=b.id
WINDOW w AS (PARTITION BY batch, machine_id ORDER BY timestamp_sta range between unbounded preceding and unbounded following)
ORDER BY timestamp_sta, batch, machine_id;

answered Sep 27 '22 22:09

Łukasz Kamiński

Related questions
                            
                                Configuring Amazon Elastic Beanstalk with PostGIS
                            
                                psql and pg_dump version mismatch
                            
                                Rails - order on column's values (priority column)
                            
                                How to delete first few records from a table without any criteria in PostgreSQL?
                            
                                "Server doesn't listen" in pgAdmin III with postgreSQL
                            
                                How to run rake in ruby-on-rails application in production?
                            
                                Compare Timestamp with date in sequelize query
                            
                                How to set DBParameterGroup Family property for Postgres 10.6
                            
                                What does Import Error: Symbol not found: _PQencryptPasswordConn mean and how do I fix it?
                            
                                Stream with a lot of UPDATEs and PostgreSQL
                            
                                PostgreSql + Date Format Convert YYYY-MM-DD to Day, Date Month Year
                            
                                ruby-pg sanitize data before insert
                            
                                Find possible duplicates in two columns ignoring case and special characters
                            
                                'stuff' and 'for xml path('')' from SQL Server in Postgresql
                            
                                ActiveRecord::ConnectionNotEstablished within a rake task
                            
                                PostgreSQL: what is the difference between float(1) and float(24)?
                            
                                Using postgresql with php under windows/xampp
                            
                                How to delete only one row if several found?
                            
                                Error: EXDEV: cross-device link not permitted, rename '/tmp/ on Ubuntu 16.04 LTS
                            
                                Rails exec_query bindings ignored

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

First and last value of window function in one row in PostgreSQL

Tags:

postgresql

window-functions

Michal Špondr

People also ask

3 Answers

Erwin Brandstetter

JGH

Łukasz Kamiński

Recent Activity

Donate For Us