Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Imputing missing values linearly in R

Tags:

r

missing-data

I have a data frame with missing values:

X   Y   Z
54  57  57
100 58  58
NA  NA  NA
NA  NA  NA
NA  NA  NA
60  62  56
NA  NA  NA
NA  NA  NA
69  62  62

I want to impute the NA values linearly from the known values so that the dataframe looks:

X   Y    Z
54  57  57
100 58  58
90  59  57.5
80  60  57
70  61  56.5
60  62  56
63  62  58
66  62  60
69  60  62

thanks

like image 724
Filly Avatar asked Mar 27 '14 16:03

Filly


2 Answers

Base R's approxfun() returns a function that will linearly interpolate the data it is handed.

## Make easily reproducible data
df <- read.table(text="X   Y   Z
54  57  57
100 58  58
NA  NA  NA
NA  NA  NA
NA  NA  NA
60  62  56
NA  NA  NA
NA  NA  NA
69  62  62", header=T)

## See how this works on a single vector
approxfun(1:9, df$X)(1:9)
# [1]  54 100  90  80  70  60  63  66  69

## Apply interpolation to each of the data.frame's columns
data.frame(lapply(df, function(X) approxfun(seq_along(X), X)(seq_along(X))))
#     X  Y    Z
# 1  54 57 57.0
# 2 100 58 58.0
# 3  90 59 57.5
# 4  80 60 57.0
# 5  70 61 56.5
# 6  60 62 56.0
# 7  63 62 58.0
# 8  66 62 60.0
# 9  69 62 62.0
like image 147
Josh O'Brien Avatar answered Sep 22 '22 02:09

Josh O'Brien


I can recommend the imputeTS package, which I am maintaining (even if it's for time series imputation)

For this case it would work like this:

library(imputeTS)
df$X <- na_interpolation(df$X, option ="linear")
df$Y <- na_interpolation(df$Y, option ="linear")
df$Z <- na_interpolation(df$Z, option ="linear")

As mentioned the package requires time series / vector input. (that's why each column has to be called separately)

The package offers also a lot of other imputation functions like e.g. spline interpolation.

like image 28
Steffen Moritz Avatar answered Sep 22 '22 02:09

Steffen Moritz