Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill missing values with first non-null following value in Redshift

I am on Redshift. Given the following data:

CREATE TABLE test (
id INT,
val1 INT,
val2 INT
);

INSERT INTO test VALUES
(1, 0,  NULL),
(2, 0,  NULL),
(3, 13, 1),
(4, 0,  NULL),
(5, 0,  NULL),
(6, 0,  NULL),
(7, 0,  NULL),
(8, 21, 2),
(9, 0,  NULL),
(10, 143,3)
;

I'd want to fill the missing val2 values with the first following non-null value, e.g.

   INSERT INTO results VALUES
    (1, 0,  1),
    (2, 0,  1),
    (3, 13, 1),
    (4, 0,  2),
    (5, 0,  2),
    (6, 0,  2),
    (7, 0,  2),
    (8, 21, 2),
    (9, 0,  3),
    (10,143,3)
    ;

What is the best way to accomplish this in Redshift/Postgres 8.0.2?

like image 274
Roberto Avatar asked Nov 10 '22 05:11

Roberto


1 Answers

One way I've been able to solve it (taking advantage of the fact that the non-null val2 values are sequential) is below. The performance is awful though, so any better solutions would be more than welcome.

SELECT
  t1.id
  , t1.val1
  , COALESCE(t1.val2, MIN(t2.val2)) as val2
FROM test t2 LEFT JOIN test t1 ON t2.id >= t1.id
WHERE t2.val2 IS NOT NULL
AND t1.val1 IS NOT NULL
GROUP BY 1, 2, t1.val2
ORDER BY t1.id
;

SQLFiddle link

like image 104
Roberto Avatar answered Nov 14 '22 22:11

Roberto