I have a data frame with missing values:
X Y Z
54 57 57
100 58 58
NA NA NA
NA NA NA
NA NA NA
60 62 56
NA NA NA
NA NA NA
69 62 62
I want to impute the NA values linearly from the known values so that the dataframe looks:
X Y Z
54 57 57
100 58 58
90 59 57.5
80 60 57
70 61 56.5
60 62 56
63 62 58
66 62 60
69 60 62
thanks
Base R's approxfun()
returns a function that will linearly interpolate the data it is handed.
## Make easily reproducible data
df <- read.table(text="X Y Z
54 57 57
100 58 58
NA NA NA
NA NA NA
NA NA NA
60 62 56
NA NA NA
NA NA NA
69 62 62", header=T)
## See how this works on a single vector
approxfun(1:9, df$X)(1:9)
# [1] 54 100 90 80 70 60 63 66 69
## Apply interpolation to each of the data.frame's columns
data.frame(lapply(df, function(X) approxfun(seq_along(X), X)(seq_along(X))))
# X Y Z
# 1 54 57 57.0
# 2 100 58 58.0
# 3 90 59 57.5
# 4 80 60 57.0
# 5 70 61 56.5
# 6 60 62 56.0
# 7 63 62 58.0
# 8 66 62 60.0
# 9 69 62 62.0
I can recommend the imputeTS package, which I am maintaining (even if it's for time series imputation)
For this case it would work like this:
library(imputeTS)
df$X <- na_interpolation(df$X, option ="linear")
df$Y <- na_interpolation(df$Y, option ="linear")
df$Z <- na_interpolation(df$Z, option ="linear")
As mentioned the package requires time series / vector input. (that's why each column has to be called separately)
The package offers also a lot of other imputation functions like e.g. spline interpolation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With