Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use LAST_VALUE in PostgreSQL?

I have a little table to try to understand how the LAST_VALUE function works in PostgreSQL. It looks like this:

 id | value
----+--------
  0 | A
  1 | B
  2 | C
  3 | D
  4 | E
  5 | [null]
  6 | F

What I want to do is to use LAST_VALUE to fill the NULL value with the precedent non-NULL value, so the result should be this:

 id | value
----+--------
  0 | A
  1 | B
  2 | C
  3 | D
  4 | E
  5 | E
  6 | F

The query I tried to accomplish that is:

SELECT LAST_VALUE(value)
OVER (PARTITION BY id ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC)
FROM test;

From what I understand of the LAST_VALUE function, it takes all the rows before the current one as a window, sorts them following the ORDER By thing and then returns the last row of the window. With my ORDER BY, all the rows containing a NULL should be put on top of the window, so LAST_VALUE should return the last non NULL value. But it doesn't.

I am clearly missing something. Please help.

like image 526
Gaëtan Avatar asked Sep 25 '19 11:09

Gaëtan


2 Answers

I'm not sure last_value will do what you want. It would be better to use lag:

select id,
coalesce(value, lag(value) OVER (order by id))
FROM test;
 id | coalesce
----+----------
  0 | A
  1 | B
  2 | C
  3 | D
  4 | E
  5 | E
  6 | F
(7 rows)

last_value will return the last value of the current frame. Since you partitioned by id, there's only ever one value in the current frame. lag will return the previous row (by default) in the frame, which seems to be exactly what you want.

To expand on this answer a bit, you can use row_number() to give you a good idea of the frame you are looking at. For your proposed solution, look at the row numbers for each row, when you partition by id:

SELECT id, row_number() OVER (PARTITION BY id ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC)
FROM test;
 id | row_number
----+------------
  0 |          1
  1 |          1
  2 |          1
  3 |          1
  4 |          1
  5 |          1
  6 |          1
(7 rows)

Each row is its own frame, so you won't be able to get anything values from other rows.

If we don't partition by id, but still use your ordering, you can see why this still won't work for last_value:

 SELECT id, row_number() OVER (ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC, id)
FROM test;
 id | row_number
----+------------
  5 |          1
  0 |          2
  1 |          3
  2 |          4
  3 |          5
  4 |          6
  6 |          7
(7 rows)

In this case, the row that was NULL is first. By default, last_value will include rows up to the current row, which in this case is just the current row for id 5. You could include all rows in your frame:

SELECT id, 
  row_number() OVER (ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC, 
id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), 
  last_value(value) OVER (ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC, id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM test;
 id | row_number | last_value
----+------------+------------
  5 |          1 | F
  0 |          2 | F
  1 |          3 | F
  2 |          4 | F
  3 |          5 | F
  4 |          6 | F
  6 |          7 | F
(7 rows)

But now the last row is the end of the frame and it's clearly not what you want. If you're looking for the previous row, choose lag().

like image 125
Jeremy Avatar answered Oct 04 '22 10:10

Jeremy


So, thanks to Jeremy's explanations and another post (PostgreSQL last_value ignore nulls) I finally figured it out:

SELECT id, value, first_value(value) OVER (partition by t.isnull) AS new_val
FROM(
    SELECT id, value, SUM (CASE WHEN value IS NOT NULL THEN 1 END) OVER (ORDER BY id) AS isnull
    FROM test) t;

This query returns the result I expected.

like image 41
Gaëtan Avatar answered Oct 04 '22 09:10

Gaëtan