Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

linear interpolate missing values in time series

I would like to add all missing dates between min and max date in a data.frame and linear interpolate all missing values, like

df <- data.frame(date = as.Date(c("2015-10-05","2015-10-08","2015-10-09",
                                  "2015-10-12","2015-10-14")),       
                 value = c(8,3,9,NA,5))

      date value
2015-10-05     8
2015-10-08     3
2015-10-09     9
2015-10-12    NA
2015-10-14     5

      date value approx
2015-10-05     8      8
2015-10-06    NA   6.33
2015-10-07    NA   4.67
2015-10-08     3      3
2015-10-09     9      9
2015-10-10    NA   8.20
2015-10-11    NA   7.40
2015-10-12    NA   6.60
2015-10-13    NA   5.80
2015-10-14     5      5

Is there a clear solution with dplyr and approx? (I do not like my 10 line for loop code.)

like image 413
ckluss Avatar asked Oct 17 '15 11:10

ckluss


People also ask

What is linear interpolation for missing data?

Linear Interpolation means estimating a missing value by connecting dots in the straight line in increasing order. It estimates the unknown value in the same increasing order as the previous values. The default method used by Interpolation is Linear so while applying it one does not need to specify it.

Which interpolation method is best for time series?

Linear interpolation is the most straightforward and commonly used interpolation method.

How do you interpolate missing time series data in Excel?

Often you may have one or more missing values in a series in Excel that you'd like to fill in. The simplest way to fill in missing values is to use the Fill Series function within the Editing section on the Home tab.


2 Answers

Another nice and short solution (using imputeTS):

library(imputeTS)
x <- zoo(df$value,df$date)
x <- na.interpolation(x, option = "linear")
print(x)
like image 186
Steffen Moritz Avatar answered Nov 15 '22 21:11

Steffen Moritz


Here is one way. I created a data frame with a sequence of date using the first and last date. Using full_join() in the dplyr package, I merged the data frame and mydf. I then used na.approx() in the zoo package to handle the interpolation in the mutate() part.

mydf <- data.frame(date = as.Date(c("2015-10-05","2015-10-08","2015-10-09",
                                    "2015-10-12","2015-10-14")),       
                   value = c(8,3,9,NA,5))

library(dplyr)
library(zoo)

data.frame(date = seq(mydf$date[1], mydf$date[nrow(mydf)], by = 1)) %>%
full_join(mydf, by = "date") %>%
mutate(approx = na.approx(value))

#         date value   approx
#1  2015-10-05     8 8.000000
#2  2015-10-06    NA 6.333333
#3  2015-10-07    NA 4.666667
#4  2015-10-08     3 3.000000
#5  2015-10-09     9 9.000000
#6  2015-10-10    NA 8.200000
#7  2015-10-11    NA 7.400000
#8  2015-10-12    NA 6.600000
#9  2015-10-13    NA 5.800000
#10 2015-10-14     5 5.000000
like image 43
jazzurro Avatar answered Nov 15 '22 20:11

jazzurro