Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I efficiently select the previous non-null value?

Tags:

postgresql

I have a table in Postgres that looks like this:

# select * from p;
 id | value 
----+-------
  1 |   100
  2 |      
  3 |      
  4 |      
  5 |      
  6 |      
  7 |      
  8 |   200
  9 |          
(9 rows)

And I'd like to query to make it look like this:

# select * from p;
 id | value | new_value
----+-------+----------
  1 |   100 |    
  2 |       |    100
  3 |       |    100
  4 |       |    100
  5 |       |    100
  6 |       |    100
  7 |       |    100
  8 |   200 |    100
  9 |       |    200
(9 rows)

I can already do this with a subquery in the select, but in my real data I have 20k or more rows and it gets to be quite slow.

Is this possible to do in a window function? I'd love to use lag(), but it doesn't seem to support the IGNORE NULLS option.

select id, value, lag(value, 1) over (order by id) as new_value from p;
 id | value | new_value
----+-------+-----------
  1 |   100 |      
  2 |       |       100
  3 |       |      
  4 |       |
  5 |       |
  6 |       |
  7 |       |
  8 |   200 |
  9 |       |       200
(9 rows)
like image 824
adamlamar Avatar asked Sep 24 '13 17:09

adamlamar


People also ask

How do I find the second non null value in SQL?

2 Answers. Show activity on this post. You can do this with a more complex case statement: select (case when c5 is not null then coalesce(c4, c3, c2, c1) when c4 is not null then coalesce(c3, c2, c1) when c3 is not null then coalesce(c2, c1) else c1 end) . . .

How do I select data without NULL values in SQL?

Below is the syntax to filter the rows without a null value in a specified column. Syntax: SELECT * FROM <table_name> WHERE <column_name> IS NOT NULL; Example: SELECT * FROM demo_orders WHERE ORDER_DATE IS NOT NULL; --Will output the rows consisting of non null order_date values.

How do I find the first non null value in SQL?

SQL COALESCE – a function that returns the first defined, i.e. non-NULL value from its argument list. Usually one or more COALESCE function arguments is the column of the table the query is addressed to. Often a subquery is also an argument for a function.

How do I avoid NULL values in select query?

SELECT column_names FROM table_name WHERE column_name IS NOT NULL; Query: SELECT * FROM Student WHERE Name IS NOT NULL AND Department IS NOT NULL AND Roll_No IS NOT NULL; To exclude the null values from all the columns we used AND operator.


3 Answers

I found this answer for SQL Server that also works in Postgres. Having never done it before, I thought the technique was quite clever. Basically, he creates a custom partition for the windowing function by using a case statement inside of a nested query that increments a sum when the value is not null and leaves it alone otherwise. This allows one to delineate every null section with the same number as the previous non-null value. Here's the query:

SELECT
  id, value, value_partition, first_value(value) over (partition by value_partition order by id)
FROM (
  SELECT
    id,
    value,
    sum(case when value is null then 0 else 1 end) over (order by id) as value_partition

  FROM p
  ORDER BY id ASC
) as q

And the results:

 id | value | value_partition | first_value
----+-------+-----------------+-------------
  1 |   100 |               1 |         100
  2 |       |               1 |         100
  3 |       |               1 |         100
  4 |       |               1 |         100
  5 |       |               1 |         100
  6 |       |               1 |         100
  7 |       |               1 |         100
  8 |   200 |               2 |         200
  9 |       |               2 |         200
(9 rows)
like image 156
adamlamar Avatar answered Oct 11 '22 10:10

adamlamar


You can create a custom aggregate function in Postgres. Here's an example for the int type:

CREATE FUNCTION coalesce_agg_sfunc(state int, value int) RETURNS int AS
$$
    SELECT coalesce(value, state);
$$ LANGUAGE SQL;

CREATE AGGREGATE coalesce_agg(int) (
    SFUNC = coalesce_agg_sfunc,
    STYPE  = int);

Then query as usual.

SELECT *, coalesce_agg(b) over w, sum(b) over w FROM y
  WINDOW w AS (ORDER BY a);

a b coalesce_agg sum 
- - ------------ ---
a 0            0   0
b ∅            0   0
c 2            2   2
d 3            3   5
e ∅            3   5
f 5            5  10
(6 rows)
like image 36
Slobodan Pejic Avatar answered Oct 11 '22 11:10

Slobodan Pejic


Well, I can't guarantee this is the most efficient way, but works:

SELECT id, value, (
    SELECT p2.value
    FROM p p2
    WHERE p2.value IS NOT NULL AND p2.id <= p1.id
    ORDER BY p2.id DESC
    LIMIT 1
) AS new_value
FROM p p1 ORDER BY id;

The following index can improve the sub-query for large datasets:

CREATE INDEX idx_p_idvalue_nonnull ON p (id, value) WHERE value IS NOT NULL;

Assuming the value is sparse (e.g. there are a lot of nulls) it will run fine.

like image 4
MatheusOl Avatar answered Oct 11 '22 11:10

MatheusOl