I'm looking for something similar to na.locf()
in the zoo
package, but instead of always using the previous non-NA
value I'd like to use the nearest non-NA
value. Some example data:
dat <- c(1, 3, NA, NA, 5, 7)
Replacing NA
with na.locf
(3 is carried forward):
library(zoo) na.locf(dat) # 1 3 3 3 5 7
and na.locf
with fromLast
set to TRUE
(5 is carried backwards):
na.locf(dat, fromLast = TRUE) # 1 3 5 5 5 7
But I wish the nearest non-NA
value to be used. In my example this means that the 3 should be carried forward to the first NA
, and the 5 should be carried backwards to the second NA
:
1 3 3 5 5 7
I have a solution coded up, but wanted to make sure that I wasn't reinventing the wheel. Is there something already floating around?
FYI, my current code is as follows. Perhaps if nothing else, someone can suggest how to make it more efficient. I feel like I'm missing an obvious way to improve this:
na.pos <- which(is.na(dat)) if (length(na.pos) == length(dat)) { return(dat) } non.na.pos <- setdiff(seq_along(dat), na.pos) nearest.non.na.pos <- sapply(na.pos, function(x) { return(which.min(abs(non.na.pos - x))) }) dat[na.pos] <- dat[non.na.pos[nearest.non.na.pos]]
To answer smci's questions below:
Update So it turns out that we're going in a different direction altogether but this was still an interesting discussion. Thanks all!
The classic way to replace NA's in R is by using the IS.NA() function. The IS.NA() function takes a vector or data frame as input and returns a logical object that indicates whether a value is missing (TRUE or VALUE). Next, you can use this logical object to create a subset of the missing values and assign them a zero.
To replace NA with 0 in an R data frame, use is.na() function and then select all those values with NA and assign them to 0. myDataframe is the data frame in which you would like replace all NAs with 0.
So, how do you replace missing values with basic R code? To replace the missing values, you first identify the NA's with the is.na() function and the $-operator. Then, you use the min() function to replace the NA's with the lowest value.
The easiest way to replace NA's in an R data frame is by using the replace_na() function and the mean() function. The first function identifies the missing values, whereas the latter replaces the NA's with the mean.
Here is a very fast one. It uses findInterval
to find what two positions should be considered for each NA
in your original data:
f1 <- function(dat) { N <- length(dat) na.pos <- which(is.na(dat)) if (length(na.pos) %in% c(0, N)) { return(dat) } non.na.pos <- which(!is.na(dat)) intervals <- findInterval(na.pos, non.na.pos, all.inside = TRUE) left.pos <- non.na.pos[pmax(1, intervals)] right.pos <- non.na.pos[pmin(N, intervals+1)] left.dist <- na.pos - left.pos right.dist <- right.pos - na.pos dat[na.pos] <- ifelse(left.dist <= right.dist, dat[left.pos], dat[right.pos]) return(dat) }
And here I test it:
# sample data, suggested by @JeffAllen dat <- as.integer(runif(50000, min=0, max=10)) dat[dat==0] <- NA # computation times system.time(r0 <- f0(dat)) # your function # user system elapsed # 5.52 0.00 5.52 system.time(r1 <- f1(dat)) # this function # user system elapsed # 0.01 0.00 0.03 identical(r0, r1) # [1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With