Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I resample and interpolate timeseries data in R?

I have measurements that have been recorded approximately every 5 minutes:

2012-07-09T05:30:01+02:00   1906.1  1069.2  1093.2  3   1071.0  1905.7  
2012-07-09T05:35:02+02:00   1905.7  1069.2  1093.0  0   1071.5  1905.7  
2012-07-09T05:40:02+02:00   1906.1  1068.7  1093.2  0   1069.4  1905.7  
2012-07-09T05:45:02+02:00   1905.7  1068.4  1093.0  1   1069.6  1905.7  
2012-07-09T05:50:02+02:00   1905.7  1068.2  1093.0  4   1073.3  1905.7  

The first column is the data's timestamp. The remaining columns are the recorded data.

I need to resample my data so that I have one row every 15 minutes, e.g. something like:

2012-07-09T05:15:00 XX XX XX XX XX XX
2012-07-09T05:30:00 XX XX XX XX XX XX
....

(In addition, there may be gaps in the recorded data and I would like gaps of more than, say, one hour to be replaced with a row of NA values.)

I can think of several ways to program this by hand, but is there built-in support for doing that kind of stuff in R? I've looked at the different libraries for dealing with timeseries data (zoo, chron etc) but couldn't find anything satisfactory.

like image 664
lindelof Avatar asked Jul 09 '12 04:07

lindelof


People also ask

Which interpolation method is best for time series?

Linear interpolation works the best when we have many points.

What is interpolation time series?

Interpolation is mostly used while working with time-series data because in time-series data we like to fill missing values with previous one or two values. for example, suppose temperature, now we would always prefer to fill today's temperature with the mean of the last 2 days, not with the mean of the month.

Which method is used for interpolation?

They are: Linear Interpolation Method – This method applies a distinct linear polynomial between each pair of data points for curves, or within the sets of three points for surfaces. Nearest Neighbour Method – This method inserts the value of an interpolated point to the value of the most adjacent data point.


1 Answers

You can use approx or the related approxfun. If t is the vector consisting of the timepoints where your data was sampled and if y is the vector with the data then f <- approxfun(t,y) creates a function f that linearly interpolates the data points in between the time points.

Example:

# irregular time points at which data was sampled
t <- c(5,10,15,25,30,40,50)
# measurements 
y <- c(4.3,1.2,5.4,7.6,3.2,1.2,3.7)

f <- approxfun(t,y)

# get interpolated values for time points 5, 20, 35, 50
f(seq(from=5,to=50,by=15))
[1] 4.3 6.5 2.2 3.7
like image 78
Adrian Avatar answered Nov 15 '22 05:11

Adrian