Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interpolate missing values in a time series with a seasonal cycle

I have a time series for which I want to intelligently interpolate the missing values. The value at a particular time is influenced by a multi-day trend, as well as its position in the daily cycle.

Here is an example in which the tenth observation is missing from myzoo

start <- as.POSIXct("2010-01-01") 
freq <- as.difftime(6, units = "hours") 
dayvals <- (1:4)*10 
timevals <- c(3, 1, 2, 4) 
index <- seq(from = start, by = freq, length.out = 16)
obs <- (rep(dayvals, each = 4) + rep(timevals, times = 4))
myzoo <- zoo(obs, index)
myzoo[10] <- NA

If I had to implement this, I'd use some kind of weighted mean of close times on nearby days, or add a value for the day to a function line fitted to the larger trend, but I hope there already exist some package or functions that apply to this situation?

EDIT: Modified the code slightly to clarify my problem. There are na.* methods that interpolate from nearest neighbors, but in this case they do not recognize that the missing value is at the time that is the lowest value of the day. Maybe the solution is to reshape the data to wide format and then interpolate, but I wouldn't like to completely disregard the contiguous values from the same day. It is worth noting that diff(myzoo, lag = 4) returns a vector of 10's. The solution may lie with some combination of reshape, na.spline, and diff.inv, but I just can't figure it out.

Here are three approaches that don't work: enter image description here

EDIT2. Image produced using the following code.

myzoo <- zoo(obs, index)
myzoo[10] <- NA # knock out the missing point
plot(myzoo, type="o", pch=16) # plot solid line
points(na.approx(myzoo)[10], col = "red")
points(na.locf(myzoo)[10], col = "blue")
points(na.spline(myzoo)[10], col = "green")
myzoo[10] <- 31 # replace the missing point
lines(myzoo, type = "o", lty=3, pch=16) # dashed line over the gap
legend(x = "topleft", 
       legend = c("na.spline", "na.locf", "na.approx"), 
       col=c("green","blue","red"), pch = 1)
like image 586
J. Win. Avatar asked Feb 11 '11 00:02

J. Win.


People also ask

Can Arima handle missing values?

You can fit ARIMA models with missing values easily because all ARIMA models are state space models and the Kalman filter, which is used to fit state space models, deals with missing values exactly by simply skipping the update phase.

How do you impute missing values in a time series data Python?

One way to impute missing values in a time series data is to fill them with either the last or the next observed values. Pandas have fillna() function which has method parameter where we can choose “ffill” to fill with the next observed value or “bfill” to fill with the previously observed value.

What is interpolation in time series data?

Interpolation is mostly used while working with time-series data because in time-series data we like to fill missing values with previous one or two values. for example, suppose temperature, now we would always prefer to fill today's temperature with the mean of the last 2 days, not with the mean of the month.


1 Answers

Try this:

x <- ts(myzoo,f=4)
fit <- ts(rowSums(tsSmooth(StructTS(x))[,-2]))
tsp(fit) <- tsp(x)
plot(x)
lines(fit,col=2)

The idea is to use a basic structural model for the time series, which handles the missing value fine using a Kalman filter. Then a Kalman smooth is used to estimate each point in the time series, including any omitted.

I had to convert your zoo object to a ts object with frequency 4 in order to use StructTS. You may want to change the fitted values back to zoo again.

like image 134
Rob Hyndman Avatar answered Oct 05 '22 03:10

Rob Hyndman