I would like to impute values for NA observations at the beginning of the array, using a linear approximation of the following two non-NA observations to extrapolate the missing value. Then do the same for the NA observations at the end of the array, using the preceding two non-NA observations.
A reproducible example of my df:
M=matrix(sample(1:9,10*10,T),10);M[sample(1:length(M),0.5*length(M),F)]=NA;dimnames(M)=list(paste(rep("City",dim(M)[1]),1:dim(M)[1],sep=""),paste(rep("Year",dim(M)[2]),1:dim(M)[2],sep=""))
M
Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1 NA 4 5 NA 3 NA NA NA 5 NA
City2 6 NA 3 3 NA 4 6 NA NA 7
City3 NA 7 NA 8 8 NA NA 8 NA 5
City4 3 5 3 NA NA 3 5 9 8 7
City5 4 6 6 NA NA 8 NA 7 1 NA
City6 NA NA NA NA 4 NA 8 3 6 7
City7 9 3 NA NA NA NA NA 4 NA NA
City8 5 6 9 8 5 NA NA 1 4 NA
City9 NA NA 6 NA 3 3 8 NA 7 NA
City10 NA NA NA NA NA NA NA NA NA 1
idx=rowSums(!is.na(M))>=2 # Index of rows with 2 or more non-NA to run na.approx
library(zoo)
M[idx,]=t(na.approx(t(M[idx,]),rule=1,method="linear")) # I'm using t as na.approx works on columns
Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1 NA 4.0 5 4.0 3.000000 3.50 4.0 4.5 5 NA
City2 6.0 5.5 3 3.0 5.500000 4.00 6.0 6.0 6 7
City3 4.5 7.0 3 8.0 8.000000 3.50 5.5 8.0 7 5
City4 3.0 5.0 3 8.0 6.666667 3.00 5.0 9.0 8 7
City5 4.0 6.0 6 8.0 5.333333 8.00 6.5 7.0 1 7
City6 6.5 4.5 7 8.0 4.000000 6.75 8.0 3.0 6 7
City7 9.0 3.0 8 8.0 4.500000 5.50 8.0 4.0 5 NA
City8 5.0 6.0 9 8.0 5.000000 4.25 8.0 1.0 4 NA
City9 NA NA 6 4.5 3.000000 3.00 8.0 7.5 7 NA
City10 NA NA NA NA NA NA NA NA NA 1
I would like to extrapolate the boundaries (for City1 and City9) using a linear approximation based on the two preceding/following observations. For example M[1,1] should be 3 and M[1,10] should be 5,5.
Do you know how I could do this?
In extrap, nlead is the number of leading NAs in input vector x. non.na is the subset of elements of x which are not NA. Return the input if there are no leading NA elements or if there are fewer than 2 non-NA elements. m is the slope of the first two non-NAs. Replace the first nlead elements of x with the extrapolation. Finally we apply extrap to each row of M using MM[] <- so column names are preserved and then reverse each row, repeat and reverse back:
library(zoo)
extrap <- function(x) {
nlead <- which.min(x * 0) - 1
non.na <- na.omit(x)
if (length(nlead) == 0 || nlead == 0) || length(non.na) < 2) return(x)
m <- diff(head(non.na, 2))
replace(x, seq_len(nlead), non.na[1] - nlead:1 * m)
}
nc <- ncol(M)
naApprox <- function(x) if (length(na.omit(x)) < 2) x else na.approx(x, na.rm = FALSE)
MM <- M
MM[] <- t(apply(MM, 1, naApprox))
MM[] <- t(apply(MM, 1, extrap)) # extraploate to fill leading NAs
MM[] <- t(apply(MM[, nc:1], 1, extrap))[, nc:1] # extrapolate to fill trailing NAs
giving:
> MM
Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1 3.0 4.0 5.000000 4.000000 3.000000 3.500000 4.000000 4.500000 5.000000 5.500000
City2 6.0 4.5 3.000000 3.000000 3.500000 4.000000 6.000000 6.333333 6.666667 7.000000
City3 6.5 7.0 7.500000 8.000000 8.000000 8.000000 8.000000 8.000000 6.500000 5.000000
City4 3.0 5.0 3.000000 3.000000 3.000000 3.000000 5.000000 9.000000 8.000000 7.000000
City5 4.0 6.0 6.000000 6.666667 7.333333 8.000000 7.500000 7.000000 1.000000 -5.000000
City6 -4.0 -2.0 0.000000 2.000000 4.000000 6.000000 8.000000 3.000000 6.000000 7.000000
City7 9.0 3.0 3.166667 3.333333 3.500000 3.666667 3.833333 4.000000 4.166667 4.333333
City8 5.0 6.0 9.000000 8.000000 5.000000 3.666667 2.333333 1.000000 4.000000 7.000000
City9 9.0 7.5 6.000000 4.500000 3.000000 3.000000 8.000000 7.500000 7.000000 6.500000
City10 NA NA NA NA NA NA NA NA NA 1.000000
Note We used this as M:
M <- structure(c(NA, 6L, NA, 3L, 4L, NA, 9L, 5L, NA, NA, 4L, NA, 7L,
5L, 6L, NA, 3L, 6L, NA, NA, 5L, 3L, NA, 3L, 6L, NA, NA, 9L, 6L,
NA, NA, 3L, 8L, NA, NA, NA, NA, 8L, NA, NA, 3L, NA, 8L, NA, NA,
4L, NA, 5L, 3L, NA, NA, 4L, NA, 3L, 8L, NA, NA, NA, 3L, NA, NA,
6L, NA, 5L, NA, 8L, NA, NA, 8L, NA, NA, NA, 8L, 9L, 7L, 3L, 4L,
1L, NA, NA, 5L, NA, NA, 8L, 1L, 6L, NA, 4L, 7L, NA, NA, 7L, 5L,
7L, NA, 7L, NA, NA, NA, 1L), .Dim = c(10L, 10L), .Dimnames = list(
c("City1", "City2", "City3", "City4", "City5", "City6", "City7",
"City8", "City9", "City10"), c("Year1", "Year2", "Year3",
"Year4", "Year5", "Year6", "Year7", "Year8", "Year9", "Year10"
)))
Update: Fixed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With