Impute via fill-forward/LOCF a column over a range of sequential rows in SQL?

Question

When formatting data for time-series analysis, a common need is to impute missing values by filling-forward values over time (also called Last-Observation-Carried-Forward / LOCF ).

While data-analysis environments typically provide that functionality (e.g. Pandas fillna() ), for larger data-sets it can be more efficient to compute it in SQL (e.g. utilizing a data-parallel data-warehouse appliance).

For example, consider:

    | UNIT | TIME | VALUE |
    |------|------|-------|
    | 1    | 1    | NULL  |
    | 1    | 2    | .5    |
    | 1    | 3    | NULL  |
    | 1    | 4    | NULL  |
    | 1    | 5    | .2    |
    | 1    | 6    | NULL  |
    | 2    | 1    | .6    |
    | 2    | 2    | NULL  |

which, after filling-forward the VALUE column over TIME (independently for each UNIT) yields:

    | UNIT | TIME | VALUE |
    |------|------|-------|
    | 1    | 1    | NULL  |
    | 1    | 2    | .5    |
    | 1    | 3    | .5    |
    | 1    | 4    | .5    |
    | 1    | 5    | .2    |
    | 1    | 6    | .2    |
    | 2    | 1    | .6    |
    | 2    | 2    | .6    |

(Notice the initial NULL for UNIT 1 can't be imputed as there is no prior value)

Time could also be a timestamp or datetime type column.

Vincent Zoonekynd · Accepted Answer

With some databases, e.g., Postgres, you can define your own aggregation functions. LOCF is just a running COALESCE.

CREATE OR REPLACE FUNCTION locf_state( FLOAT, FLOAT )
RETURNS FLOAT
LANGUAGE SQL
AS $f$
  SELECT COALESCE($2,$1)
$f$;

CREATE AGGREGATE locf(FLOAT) (
  SFUNC = locf_state,
  STYPE = FLOAT
);

The query is then more readable:

SELECT unit, time, 
       locf(value) OVER( PARTITION BY unit ORDER BY time )
FROM   mytable;

SQLFiddle: http://sqlfiddle.com/#!15/2c73b/1/0

Impute via fill-forward/LOCF a column over a range of sequential rows in SQL?

Tags:

sql

time-series

data-warehouse

data-cleaning

DavidJ

1 Answers

Vincent Zoonekynd

Recent Activity

Donate For Us

Impute via fill-forward/LOCF a column over a range of sequential rows in SQL?

Tags:

sql

time-series

data-warehouse

data-cleaning

DavidJ

1 Answers

Vincent Zoonekynd

Related questions

Recent Activity

Donate For Us