Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to turn (interpolate) this irregularly spaced time series into a regularly spaced one in R or Matlab?

I have the following data:

Lines = "20/03/2014,9996792524
21/04/2014,8479115468
21/09/2014,11394750532
16/10/2014,9594869828
18/11/2014,10850291677
08/12/2014,10475635302
22/01/2015,10116010939
26/02/2015,11206949341
20/03/2015,11975140317
09/04/2015,11526960332
29/04/2015,9986194500
16/09/2015,11501088256
13/10/2015,11833183163
10/11/2015,13246940910
16/12/2015,13255698568
27/01/2016,13775653990
23/02/2016,13567323648
22/03/2016,14607415705
11/04/2016,13835444224
04/04/2016,14118970743"

I read this into R:

z <- read.zoo(text = Lines, sep = ",", header = TRUE, index = 1:1, tz = "", format = "%d/%m/%Y")

I wish to interpolate data such that I can convert this irregularly spaced time series into a regular one. The time interval does not matter as long as it's regular but a monthly, weekly, or bi-weekly interval would do.

How do I do this in R or Matlab?

Note: I realize interpolated values might not be very accurate and might misrepresent information, however I need to learn how to do this and I'm alright with loosing some accuracy.

like image 849
learnerX Avatar asked Oct 30 '22 22:10

learnerX


1 Answers

Ok, first of all, word of warning: if you're going to interpolate and then perform tests or generic statistical estimation, your results will be (badly) biased, unless you have some good reasons (domain knowledge?) to assume that your interpolation method will generate points coming from the same distribution of the original points. And no, "the plot looks good" is not a good criterion to assess this :) That having being said, let's have a look at the data:

# Lines contains your data
library(zoo)
fmt <- "%d/%m/%Y" 
z <- read.zoo(text = Lines, sep = ",", header = TRUE, index = 1:1, tz = "", format = fmt)
t <- time(z)
plot(z,type="p",xaxt="n",pch=19,col="cyan",cex=1.5)
labs <- format(t,fmt)
axis(side = 1, at = t, labels = labs,cex.axis = 0.7)    

enter image description here

It looks most of your missing data pertain to summer 2014 and summer 2015. I'm curious to know what these data are...Anyway, it looks like that most of your data are spaced by at least 2 weeks:

diff(t)
# Time differences in days
# [1] 153  25  33  20  45  35  22  20  20 140  27  28  36  42  27  28  13   7

Thus let's interpolate to a biweekly series by creating first a dummy zoo object:

t.biweekly <- seq(from = min(t), to=max(t),by="2 weeks")
dummy <- zoo(,t.biweekly)

Merge the dummy series with yours:

z.interpolated <- merge(z,dummy,all=TRUE)

If you look at the new series, you'll see there are NA values at all times of dummy which don't have a corresponding time in z. Let's fill those points with linearly interpolated values and look at the result:

z.interpolated <- na.approx(z.interpolated)
plot(z.interpolated, type = "b")
points(z,pch=19,col="cyan",cex=1.5)

enter image description here

Voilà! Remember that building models for inference out of this thing is a bad idea...

like image 79
DeltaIV Avatar answered Nov 09 '22 11:11

DeltaIV