R data.table: update with shift() does not work as expected

Question

I'm trying to missing values in a data.table column with the value below it using shift, but I can only get it to work if I first create a temporary variable. Is this the expected behavior? MWE:

library(data.table)

dt <- data.table(x=c(1, NA))
dt[is.na(x), x:=shift(x)]
# Fails

dt <- data.table(x=c(1, NA))
dt <- dt[, x.lag:=shift(x)]
dt[is.na(x), x:=x.lag]
# Works

Jonathan Carroll · Accepted Answer

I'm a little new to data.table, but I think the rolling join might be what you're after here. Presumably you want to be able to impute a data point when there are multiple missing values in sequence, in which case your shift method will just fill NA.

Your example is a little too minimal to really see what you're doing, but if I expand it a little to include a record column, where various x values are missing;

library(data.table)
dt <- data.table(record=1:10, x=c(1, NA, NA, 4, 5, 6, NA, NA, NA, 10))
> dt
    record  x
 1:      1  1
 2:      2 NA
 3:      3 NA
 4:      4  4
 5:      5  5
 6:      6  6
 7:      7 NA
 8:      8 NA
 9:      9 NA
10:     10 10

Then create a copy with only the non-missing rows, and set a key as the x column

dtNA <- dt[!is.na(x)]
setkey(dtNA, record)
> dtNA
   record  x
1:      1  1
2:      4  4
3:      5  5
4:      6  6
5:     10 10

Then do a rolling join (whereby if a value is missing, the previous record is rolled forwards) on the full list of records

dtNA[data.table(record=dt$record, key="record"), roll=TRUE]
    record  x
 1:      1  1
 2:      2  1
 3:      3  1
 4:      4  4
 5:      5  5
 6:      6  6
 7:      7  6
 8:      8  6
 9:      9  6
10:     10 10

Compared to your method which produces the following (still has NA values in x);

dt[, x.lag:=shift(x)]
dt[is.na(x), x:=x.lag]
> dt
    record  x x.lag
 1:      1  1    NA
 2:      2  1     1
 3:      3 NA    NA
 4:      4  4    NA
 5:      5  5     4
 6:      6  6     5
 7:      7  6     6
 8:      8 NA    NA
 9:      9 NA    NA
10:     10 10    NA

R data.table: update with shift() does not work as expected

Tags:

r

data.table

pbaylis

1 Answers

Jonathan Carroll

Recent Activity

Donate For Us

R data.table: update with shift() does not work as expected

Tags:

r

data.table

pbaylis

1 Answers

Jonathan Carroll

Related questions

Recent Activity

Donate For Us