I'm trying to missing values in a data.table column with the value below it using shift, but I can only get it to work if I first create a temporary variable. Is this the expected behavior? MWE:
library(data.table)
dt <- data.table(x=c(1, NA))
dt[is.na(x), x:=shift(x)]
# Fails
dt <- data.table(x=c(1, NA))
dt <- dt[, x.lag:=shift(x)]
dt[is.na(x), x:=x.lag]
# Works
I'm a little new to data.table, but I think the rolling join might be what you're after here. Presumably you want to be able to impute a data point when there are multiple missing values in sequence, in which case your shift method will just fill NA.
Your example is a little too minimal to really see what you're doing, but if I expand it a little to include a record column, where various x values are missing;
library(data.table)
dt <- data.table(record=1:10, x=c(1, NA, NA, 4, 5, 6, NA, NA, NA, 10))
> dt
record x
1: 1 1
2: 2 NA
3: 3 NA
4: 4 4
5: 5 5
6: 6 6
7: 7 NA
8: 8 NA
9: 9 NA
10: 10 10
Then create a copy with only the non-missing rows, and set a key as the x column
dtNA <- dt[!is.na(x)]
setkey(dtNA, record)
> dtNA
record x
1: 1 1
2: 4 4
3: 5 5
4: 6 6
5: 10 10
Then do a rolling join (whereby if a value is missing, the previous record is rolled forwards) on the full list of records
dtNA[data.table(record=dt$record, key="record"), roll=TRUE]
record x
1: 1 1
2: 2 1
3: 3 1
4: 4 4
5: 5 5
6: 6 6
7: 7 6
8: 8 6
9: 9 6
10: 10 10
Compared to your method which produces the following (still has NA values in x);
dt[, x.lag:=shift(x)]
dt[is.na(x), x:=x.lag]
> dt
record x x.lag
1: 1 1 NA
2: 2 1 1
3: 3 NA NA
4: 4 4 NA
5: 5 5 4
6: 6 6 5
7: 7 6 6
8: 8 NA NA
9: 9 NA NA
10: 10 10 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With