Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill in NULL column values with lag(N) values

I have a simple table that is missing values for the "somebody" column. I want to fill in the NULL values with previous values in ascending order from the id field, but not the descending values (items in the past may be different). For the sake of the experiment (my actual query is far more complex), I can not simply use an UPDATE query to fill the table, I have to do this as a SELECT.

CREATE TABLE lag_test (id serial primary key, natural_key integer, somebody text);

INSERT INTO lag_test(natural_key, somebody)
VALUES (1, NULL), (1, 'Kirk'), (1, NULL), (2, 'Roybal'), (2, NULL), (2, NULL);

Sample code creates a table like this:

id  natural_key  somebody
--  -----------  --------
1   1            NULL
2   1            Kirk
3   1            NULL
4   2            Roybal
5   2            NULL
6   2            NULL

So far, I have this:

SELECT id,
       natural_key,
       COALESCE(somebody, lag(somebody) OVER (PARTITION BY natural_key)) somebody
FROM lag_test
ORDER BY natural_key, id;

Which returns this:

id  natural_key  somebody
--  -----------  --------
1   1            NULL
2   1            Kirk
3   1            Kirk
4   2            Roybal
5   2            Roybal
6   2            NULL

I would like it to return this:

id  natural_key  somebody
--  -----------  --------
1   1            NULL
2   1            Kirk
3   1            Kirk
4   2            Roybal
5   2            Roybal
6   2            Roybal

The basic question is: How do I get lag() to work N rows into the past so that the row id:6,natural_key:2 receives a value for the "somebody" column?

I'm working with PG 9.3.4.

Update: Reading the docs, I found out that lag takes an optional parameter [offset] that I was able to use to some extent. Hope somebody can help me refine this:

SELECT id,
       natural_key,
       COALESCE(somebody,
                lag(somebody, 1) OVER (PARTITION BY natural_key),
                lag(somebody, 2) OVER (PARTITION BY natural_key),
                lag(somebody, 3) OVER (PARTITION BY natural_key)
               ) somebody
FROM lag_test
ORDER BY natural_key, id;

This solves the problem for the limited test set shown in the OP. The real question has not yet been answered.

Edit 2:

I also figured out this little gem.

SELECT id, natural_key, 
  regexp_replace(string_agg(somebody, '|') OVER (ORDER BY id)::text, '^.*\|', '', 'g') somebody 
FROM lag_test 
ORDER BY natural_key, id;

Which only works for data that doesn't contain a pipe "|" symbol. Kinda hacky, but performance is good.

like image 473
Kirk Roybal Avatar asked May 28 '14 20:05

Kirk Roybal


2 Answers

Here is one with the correct output -- according to what you show. To extend the test, I entered a few more values creating two different names with a gap between them.

CREATE TABLE lag_test(
  id serial primary key,
  natural_key integer,
  somebody text);

INSERT  INTO lag_test( natural_key, somebody )
VALUES  (1, NULL), (1, 'Kirk'), (1, NULL), (1, NULL), (1, 'James'), (1, NULL),
        (2, 'Roybal'), (2, NULL), (2, NULL),
        (3, NULL), (3, 'Truman'), (3, NULL), (3, NULL);

I can't figure out if this will ever work with analytics (not using LAG anyway), but here is one solution which has one join and one subquery. Fairly simple really.

SELECT  lt.ID ID, lt.Natural_key,
        CASE WHEN lt.Somebody IS NULL
            THEN lt1.Somebody
            ELSE lt.Somebody END SomeBody
  FROM  lag_test lt
  LEFT JOIN lag_test lt1
    ON  lt1.ID =(
        SELECT  MAX( ID )
          FROM  lag_test
         WHERE  Natural_key = lt.Natural_key
           and  ID < lt.ID
           AND  SomeBody IS NOT NULL);

See the SQL Fiddle sandbox.

like image 175
TommCatt Avatar answered Oct 03 '22 10:10

TommCatt


Can't test it but try this:

SELECT lt.id, lt.natural_key, l.somebody from lag_test lt inner join (select 
lt.natural_key, lt.somebody from lag_test lt inner join (select MAX(id) as LastID, 
somebody from lag_test WHERE NOT somebody is null GROUP BY somebody) as lson 
lt.id=ls.LastID) as l on lt.natural_key=l.natural_key

May not be the most compact way but it works for me.

This is the Result

id  natural_key somebody
----------------------------
1   1   Kirk
2   1   Kirk
3   1   Kirk
4   1   Kirk
5   2   Roybal
6   2   Roybal
7   2   Roybal
like image 25
ericpap Avatar answered Oct 03 '22 12:10

ericpap