Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table fill missing values from other rows by group

Tags:

r

na

data.table

row

# have
> aDT <- data.table(colA = c(1,1,1,1,2,2,2,2,3,3,3,3), colB = c(4,NA,NA,1,4,3,NA,NA,4,NA,2,NA))
> aDT
    colA colB
 1:    1    4
 2:    1   NA
 3:    1   NA
 4:    1    1
 5:    2    4
 6:    2    3
 7:    2   NA
 8:    2   NA
 9:    3    4
10:    3   NA
11:    3    2
12:    3   NA
# want
> bDT <- data.table(colA = c(1,1,1,1,2,2,2,2,3,3,3,3), colB = c(4,1,1,1,4,3,3,3,4,2,2,2))
> bDT
    colA colB
 1:    1    4
 2:    1    1
 3:    1    1
 4:    1    1
 5:    2    4
 6:    2    3
 7:    2    3
 8:    2    3
 9:    3    4
10:    3    2
11:    3    2
12:    3    2

Would like to fill missing values according to the algorithm below: within each group ('colA'),

  1. use the value from one row below, if it's still NA, keeps going until the last row within that group
  2. if all NAs in rows below, look at rows above (go up 1 row at a time)
  3. if all NAs, then NA

Since the dataset is quite large, algorithmic efficiency is part of consideration. Not sure if there's any package for this type of operation already. How to do it?

like image 915
LeGeniusII Avatar asked Dec 23 '22 09:12

LeGeniusII


1 Answers

With data.table and zoo:

library(data.table)
library(zoo)

# Last observation carried forward from last row of group
dt <- dt[, colB := na.locf0(colB, fromLast = TRUE), by = colA]

# Last observation carried forward for first row of group
dt[, colB := na.locf(colB), by = colA][]

Or in a single chain:

dt[, colB := na.locf0(colB, fromLast = TRUE), by = colA][
   , colB := na.locf(colB), by = colA][]

Both return:

    colA colB
 1:    1    4
 2:    1    1
 3:    1    1
 4:    1    1
 5:    2    4
 6:    2    3
 7:    2    3
 8:    2    3
 9:    3    4
10:    3    2
11:    3    2
12:    3    2

Data:

text <- "colA colB
    1    4
    1   NA
    1   NA
    1    1
    2    4
    2    3
    2   NA
    2   NA
    3    4
    3   NA
    3    2
    3   NA"

dt <- fread(input = text, stringsAsFactors = FALSE)
like image 171
tyluRp Avatar answered Dec 26 '22 00:12

tyluRp