Filling in NA values based on other values in the column

Question

I have a data.table with a large number of missing values. I would like to fill these by adding or subtracting values from the available values in the data.table. In particular, consider this data:

> test = data.table(id=c("A","A","A","A","A","B","B","B","B","B"), x=c(NA,NA,0,NA,NA,NA,NA,0,NA,NA))
> test
    id  x
 1:  A NA
 2:  A NA
 3:  A  0
 4:  A NA
 5:  A NA
 6:  B NA
 7:  B NA
 8:  B  0
 9:  B NA
10:  B NA

I need an operation which transforms this into that:

    id  x
1:  A -2
2:  A -1
3:  A  0
4:  A  1
5:  A  2
6:  B -2
7:  B -1
8:  B  0
9:  B  1
10: B  2

Basically a version of na.locf which increments the last value rather than repeating it.

akrun · Accepted Answer

We can group by 'id', and take the difference of the row number (seq_len(.N)) with the position (which) in 'x' where it is 0 (!x). I am wrapping with as.numeric as the 'x' column is numeric in the input dataset, but from the difference, it is converted to 'integer'. If there is a clash in class while assigning (:=), the data.table will show error as it needs matching class.

test[, x:= as.numeric(seq_len(.N)-which(!x)), id]
test
#    id  x
# 1:  A -2
# 2:  A -1
# 3:  A  0
# 4:  A  1
# 5:  A  2
# 6:  B -2
# 7:  B -1
# 8:  B  0
# 9:  B  1
#10:  B  2

!x is otherwise written more clearly as x==0. It returns a logical vector of TRUE/FALSE. If there are NA values, it will remain as NA. By wrapping with which, we get the position of 0 value. In the example, it is 3 for each 'id'.

Filling in NA values based on other values in the column

Tags:

r

missing-data

data.table

time-series

phildeutsch

1 Answers

akrun

Recent Activity

Donate For Us

Filling in NA values based on other values in the column

Tags:

r

missing-data

data.table

time-series

phildeutsch

1 Answers

akrun

Related questions

Recent Activity

Donate For Us