I have a table in Postgres that looks like this:
# select * from p;
id | value
----+-------
1 | 100
2 |
3 |
4 |
5 |
6 |
7 |
8 | 200
9 |
(9 rows)
And I'd like to query to make it look like this:
# select * from p;
id | value | new_value
----+-------+----------
1 | 100 |
2 | | 100
3 | | 100
4 | | 100
5 | | 100
6 | | 100
7 | | 100
8 | 200 | 100
9 | | 200
(9 rows)
I can already do this with a subquery in the select, but in my real data I have 20k or more rows and it gets to be quite slow.
Is this possible to do in a window function? I'd love to use lag(), but it doesn't seem to support the IGNORE NULLS option.
select id, value, lag(value, 1) over (order by id) as new_value from p;
id | value | new_value
----+-------+-----------
1 | 100 |
2 | | 100
3 | |
4 | |
5 | |
6 | |
7 | |
8 | 200 |
9 | | 200
(9 rows)
2 Answers. Show activity on this post. You can do this with a more complex case statement: select (case when c5 is not null then coalesce(c4, c3, c2, c1) when c4 is not null then coalesce(c3, c2, c1) when c3 is not null then coalesce(c2, c1) else c1 end) . . .
Below is the syntax to filter the rows without a null value in a specified column. Syntax: SELECT * FROM <table_name> WHERE <column_name> IS NOT NULL; Example: SELECT * FROM demo_orders WHERE ORDER_DATE IS NOT NULL; --Will output the rows consisting of non null order_date values.
SQL COALESCE – a function that returns the first defined, i.e. non-NULL value from its argument list. Usually one or more COALESCE function arguments is the column of the table the query is addressed to. Often a subquery is also an argument for a function.
SELECT column_names FROM table_name WHERE column_name IS NOT NULL; Query: SELECT * FROM Student WHERE Name IS NOT NULL AND Department IS NOT NULL AND Roll_No IS NOT NULL; To exclude the null values from all the columns we used AND operator.
I found this answer for SQL Server that also works in Postgres. Having never done it before, I thought the technique was quite clever. Basically, he creates a custom partition for the windowing function by using a case statement inside of a nested query that increments a sum when the value is not null and leaves it alone otherwise. This allows one to delineate every null section with the same number as the previous non-null value. Here's the query:
SELECT
id, value, value_partition, first_value(value) over (partition by value_partition order by id)
FROM (
SELECT
id,
value,
sum(case when value is null then 0 else 1 end) over (order by id) as value_partition
FROM p
ORDER BY id ASC
) as q
And the results:
id | value | value_partition | first_value
----+-------+-----------------+-------------
1 | 100 | 1 | 100
2 | | 1 | 100
3 | | 1 | 100
4 | | 1 | 100
5 | | 1 | 100
6 | | 1 | 100
7 | | 1 | 100
8 | 200 | 2 | 200
9 | | 2 | 200
(9 rows)
You can create a custom aggregate function in Postgres. Here's an example for the int
type:
CREATE FUNCTION coalesce_agg_sfunc(state int, value int) RETURNS int AS
$$
SELECT coalesce(value, state);
$$ LANGUAGE SQL;
CREATE AGGREGATE coalesce_agg(int) (
SFUNC = coalesce_agg_sfunc,
STYPE = int);
Then query as usual.
SELECT *, coalesce_agg(b) over w, sum(b) over w FROM y
WINDOW w AS (ORDER BY a);
a b coalesce_agg sum
- - ------------ ---
a 0 0 0
b ∅ 0 0
c 2 2 2
d 3 3 5
e ∅ 3 5
f 5 5 10
(6 rows)
Well, I can't guarantee this is the most efficient way, but works:
SELECT id, value, (
SELECT p2.value
FROM p p2
WHERE p2.value IS NOT NULL AND p2.id <= p1.id
ORDER BY p2.id DESC
LIMIT 1
) AS new_value
FROM p p1 ORDER BY id;
The following index can improve the sub-query for large datasets:
CREATE INDEX idx_p_idvalue_nonnull ON p (id, value) WHERE value IS NOT NULL;
Assuming the value
is sparse (e.g. there are a lot of nulls) it will run fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With