I have a vector containing NA's at the boundary
x <- c(NA, -1, 1,-1, 1, NA, -1, 2, NA, NA)
I want the outcome to be:
c(-3, -1, 1,-1, 1, 0, -1, 2, 5, 8)
In other words, I want to fill both inner and boundary NA's with linear interpolation (maybe I cannot call it "inter-polation" since NA's are at boundaries).
I tried a function in the Package "zoo", na.fill(x, "extend"), but the boundary output is not something I want, which just repeats the leftmost or rightmost non-NA value:
na.fill(x,"extend")
and the output is
[1] -1 -1 1 -1 1 0 -1 2 2 2
I also checked other functions for filling NA, such as na.approx(), na.locf(), etc. but none of them works.
na.spline does work but the output of boundary NA's lead to an extremely large variation.
na.spline(x)
The output is:
[1] -15.9475983 -1.0000000 1.0000000 -1.0000000 1.0000000 0.3400655 -1.0000000 2.0000000
[9] 13.1441048 35.9323144
The boundary points are too large. Can anyone help me out? Thanks in advance!
You can use na.spline()
from the zoo
library:
na.spline(x)
[1] 0.0 0.5 1.0 1.5 2.0 2.5
Data for the original question:
x <- c(0, NA, 1, NA, 2, NA)
Given the data and expected output after the question's edit, I believe the following function does it. It fills in the interior NA
's with approxfun
and then treats the extremes one by one.
na.extrapol <- function(y){
x <- seq_along(y)
f <- approxfun(x[!is.na(y)], y[!is.na(y)])
y[is.na(y)] <- f(x[is.na(y)])
r <- rle(is.na(y))
if(r$values[1]){
Y <- y[r$lengths[1] + 1:2]
X <- seq_len(r$lengths[1])
y[rev(X)] <- Y[1] - diff(Y)*X
}
n <- length(r$lengths)
if(r$values[n]){
s <- sum(r$lengths[-n])
Y <- y[s - 1:0]
X <- seq_len(r$lengths[n])
y[s + X] <- Y[2] + diff(Y)*X
}
y
}
x <- c(NA, -1, 1,-1, 1, NA, -1, 2, NA, NA)
na.extrapol(x)
#[1] -3 -1 1 -1 1 0 -1 2 5 8
x2 <- c(NA, NA, -1, 1,-1, 1, NA, -1, 2, NA, NA)
na.extrapol(x2)
#[1] -5 -3 -1 1 -1 1 0 -1 2 5 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With