Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limit na.locf in zoo package

Tags:

r

na

zoo

locf

I would like to do a last observation carried forward for a variable, but only up to 2 observations. That is, for gaps of data of 3 or more NA, I would only carry the last observation forward for the next 2 observations and leave the rest as NA.

If I do this with the zoo::na.locf, the maxgap parameter implies that if the gap is larger than 2, no NA is replaced. Not even the last 2. Is there any alternative?

x <- c(NA,3,4,5,6,NA,NA,NA,7,8)
zoo::na.locf(x, maxgap = 2) # Doesn't replace the first 2 NAs of after the 6 as the gap of NA is 3. 
Desired_output <- c(NA,3,4,5,6,6,6,NA,7,8)
like image 319
user3507584 Avatar asked Jan 28 '23 11:01

user3507584


2 Answers

First apply na.locf0 with maxgap = 2 giving x0 and define a grouping variable g using rleid from the data.table package. For each such group use ave to apply keeper which if the group is all NA replaces it with c(1, 1, NA, ..., NA) and otherwise outputs all 1s. Multiply na.locf0(x) by that.

library(data.table)
library(zoo)

mg <- 2
x0 <- na.locf0(x, maxgap = mg)
g <- rleid(is.na(x0))
keeper <- function(x) if (all(is.na(x)))  ifelse(seq_along(x) <= mg, 1, NA) else 1
na.locf0(x) * ave(x0, g, FUN = keeper)
## [1] NA  3  4  5  6  6  6 NA  7  8
like image 167
G. Grothendieck Avatar answered Feb 03 '23 07:02

G. Grothendieck


A solution using base R:

ave(x, cumsum(!is.na(x)), FUN = function(i){ i[1:pmin(length(i), 3)] <- i[1]; i })
# [1] NA  3  4  5  6  6  6 NA  7  8

cumsum(!is.na(x)) groups each run of NAs with most recent non-NA value.

function(i){ i[1:pmin(length(i), 3)] <- i[1]; i } transforms the first two NAs of each group into the leading non-NA value of this group.

like image 39
mt1022 Avatar answered Feb 03 '23 07:02

mt1022