I have a little table to try to understand how the LAST_VALUE
function works in PostgreSQL. It looks like this:
id | value
----+--------
0 | A
1 | B
2 | C
3 | D
4 | E
5 | [null]
6 | F
What I want to do is to use LAST_VALUE
to fill the NULL value with the precedent non-NULL value, so the result should be this:
id | value
----+--------
0 | A
1 | B
2 | C
3 | D
4 | E
5 | E
6 | F
The query I tried to accomplish that is:
SELECT LAST_VALUE(value)
OVER (PARTITION BY id ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC)
FROM test;
From what I understand of the LAST_VALUE
function, it takes all the rows before the current one as a window, sorts them following the ORDER By
thing and then returns the last row of the window. With my ORDER BY
, all the rows containing a NULL should be put on top of the window, so LAST_VALUE
should return the last non NULL value. But it doesn't.
I am clearly missing something. Please help.
I'm not sure last_value will do what you want. It would be better to use lag:
select id,
coalesce(value, lag(value) OVER (order by id))
FROM test;
id | coalesce
----+----------
0 | A
1 | B
2 | C
3 | D
4 | E
5 | E
6 | F
(7 rows)
last_value will return the last value of the current frame. Since you partitioned by id, there's only ever one value in the current frame. lag will return the previous row (by default) in the frame, which seems to be exactly what you want.
To expand on this answer a bit, you can use row_number() to give you a good idea of the frame you are looking at. For your proposed solution, look at the row numbers for each row, when you partition by id:
SELECT id, row_number() OVER (PARTITION BY id ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC)
FROM test;
id | row_number
----+------------
0 | 1
1 | 1
2 | 1
3 | 1
4 | 1
5 | 1
6 | 1
(7 rows)
Each row is its own frame, so you won't be able to get anything values from other rows.
If we don't partition by id, but still use your ordering, you can see why this still won't work for last_value:
SELECT id, row_number() OVER (ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC, id)
FROM test;
id | row_number
----+------------
5 | 1
0 | 2
1 | 3
2 | 4
3 | 5
4 | 6
6 | 7
(7 rows)
In this case, the row that was NULL is first. By default, last_value will include rows up to the current row, which in this case is just the current row for id 5. You could include all rows in your frame:
SELECT id,
row_number() OVER (ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC,
id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
last_value(value) OVER (ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC, id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM test;
id | row_number | last_value
----+------------+------------
5 | 1 | F
0 | 2 | F
1 | 3 | F
2 | 4 | F
3 | 5 | F
4 | 6 | F
6 | 7 | F
(7 rows)
But now the last row is the end of the frame and it's clearly not what you want. If you're looking for the previous row, choose lag().
So, thanks to Jeremy's explanations and another post (PostgreSQL last_value ignore nulls) I finally figured it out:
SELECT id, value, first_value(value) OVER (partition by t.isnull) AS new_val
FROM(
SELECT id, value, SUM (CASE WHEN value IS NOT NULL THEN 1 END) OVER (ORDER BY id) AS isnull
FROM test) t;
This query returns the result I expected.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With