Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL - how should I use first_value()?

This answer to shows how to produce High/Low/Open/Close values from a ticker:
Retrieve aggregates for arbitrary time intervals

I am trying to implement a solution based on this (PG 9.2), but am having difficulty in getting the correct value for first_value().

So far, I have tried two queries:

SELECT  
    cstamp,
    price,
    date_trunc('hour',cstamp) AS h,
    floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
    min(price) OVER w,
    max(price) OVER w,
    first_value(price) OVER w,
    last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
    PARTITION BY date_trunc('hour',cstamp), floor(extract(minute FROM cstamp) / 5)
    ORDER BY date_trunc('hour',cstamp) ASC, floor(extract(minute FROM cstamp) / 5) ASC
    )
ORDER BY cstamp;

Here's a piece of the result:

        cstamp         price      h                 m5  min      max      first    last
"2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000

"2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
"2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
"2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000

"2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;73.99004;77.80000;73.99004;73.99004

As you can see, 77.8 is not what I believe is the correct value for first_value(), which should be 77.0.

I though this might be due to the ambiguous ORDER BY in the WINDOW, so I changed this to

ORDER BY cstamp ASC 

but this appears to upset the PARTITION as well:

        cstamp         price      h                 m5  min      max      first    last
"2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000

"2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.00000;77.00000;77.00000
"2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.80000
"2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.00000

"2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;77.00000;77.00000;77.00000;77.00000

since the values for max and last now vary within the partition.

What am I doing wrong? Could someone help me better to understand the relation between PARTITION and ORDER within a WINDOW?


Although I have an answer, here's a trimmed-down pg_dump which will allow anyone to recreate the table. The only thing that's different is the table name.

CREATE TABLE wtest (
    cstamp timestamp without time zone,
    price numeric(10,5)
);

COPY wtest (cstamp, price) FROM stdin;
2013-03-29 09:04:54 77.80000
2013-03-29 09:04:50 76.98000
2013-03-29 09:29:51 77.00000
2013-03-29 09:29:41 77.80000
2013-03-29 09:26:18 77.00000
2013-03-29 09:19:14 77.00000
2013-03-29 09:19:10 77.00000
2013-03-29 09:33:50 76.00000
2013-03-29 09:33:46 76.10000
2013-03-29 09:33:15 77.79000
2013-03-29 09:30:08 77.80000
2013-03-29 09:30:04 77.00000
\.
like image 917
Brent.Longborough Avatar asked Mar 30 '13 17:03

Brent.Longborough


People also ask

How to get FIRST VALUE of a column in PostgreSQL?

In PostgreSQL, the FIRST_VALUE() function is used to return the first value in a sorted partition of a result set. Syntax: FIRST_VALUE ( expression ) OVER ( [PARTITION BY partition_expression, ... ] ORDER BY sort_expression [ASC | DESC], ... )

Does PostgreSQL support window function?

Introduction to PostgreSQL Window Functions. PostgreSQL window functions are used to compare contrast data, and it is a key to analytic and various use cases in PostgreSQL. The window function in PostgreSQL is used to compare the values between current rows and related to the current rows from all values of a table.

What is over clause in PostgreSQL?

The OVER clause determines exactly how the rows of the query are split up for processing by the window function. The PARTITION BY clause within OVER divides the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s).

How do I find the Postgres row number?

In PostgreSQL, the ROW_NUMBER() function is used to assign a unique integer to every row that is returned by a query. Syntax: ROW_NUMBER() OVER( [PARTITION BY column_1, column_2, …] [ORDER BY column_3, column_4, …] )


2 Answers

SQL Fiddle

All the functions you used act on the window frame, not on the partition. If omitted the frame end is the current row. To make the window frame to be the whole partition declare it in the frame clause (range...):

SELECT  
    cstamp,
    price,
    date_trunc('hour',cstamp) AS h,
    floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
    min(price) OVER w,
    max(price) OVER w,
    first_value(price) OVER w,
    last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
    PARTITION BY date_trunc('hour',cstamp) , floor(extract(minute FROM cstamp) / 5)
    ORDER BY cstamp
    range between unbounded preceding and unbounded following
    )
ORDER BY cstamp;
like image 196
Clodoaldo Neto Avatar answered Oct 13 '22 20:10

Clodoaldo Neto


Here's a quick query to illustrate the behaviour:

select 
  v,
  first_value(v) over w1 f1,
  first_value(v) over w2 f2,
  first_value(v) over w3 f3,
  last_value (v) over w1 l1,
  last_value (v) over w2 l2,
  last_value (v) over w3 l3,
  max        (v) over w1 m1,
  max        (v) over w2 m2,
  max        (v) over w3 m3,
  max        (v) over () m4
from (values(1),(2),(3),(4)) t(v)
window
  w1 as (order by v),
  w2 as (order by v rows between unbounded preceding and current row),
  w3 as (order by v rows between unbounded preceding and unbounded following)

The output of the above query can be seen here (SQLFiddle here):

| V | F1 | F2 | F3 | L1 | L2 | L3 | M1 | M2 | M3 | M4 |
|---|----|----|----|----|----|----|----|----|----|----|
| 1 |  1 |  1 |  1 |  1 |  1 |  4 |  1 |  1 |  4 |  4 |
| 2 |  1 |  1 |  1 |  2 |  2 |  4 |  2 |  2 |  4 |  4 |
| 3 |  1 |  1 |  1 |  3 |  3 |  4 |  3 |  3 |  4 |  4 |
| 4 |  1 |  1 |  1 |  4 |  4 |  4 |  4 |  4 |  4 |  4 |

Few people think of the implicit frames that are applied to window functions that take an ORDER BY clause. In this case, windows are defaulting to the frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Think about it this way:

  • On the row with v = 1 the ordered window's frame spans v IN (1)
  • On the row with v = 2 the ordered window's frame spans v IN (1, 2)
  • On the row with v = 3 the ordered window's frame spans v IN (1, 2, 3)
  • On the row with v = 4 the ordered window's frame spans v IN (1, 2, 3, 4)

If you want to prevent that behaviour, you have two options:

  • Use an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause for ordered window functions
  • Use no ORDER BY clause in those window functions that allow for omitting them (as MAX(v) OVER())

More details are explained in this article about LEAD(), LAG(), FIRST_VALUE() and LAST_VALUE()

like image 32
Lukas Eder Avatar answered Oct 13 '22 21:10

Lukas Eder