I have the following data:
Lines = "20/03/2014,9996792524
21/04/2014,8479115468
21/09/2014,11394750532
16/10/2014,9594869828
18/11/2014,10850291677
08/12/2014,10475635302
22/01/2015,10116010939
26/02/2015,11206949341
20/03/2015,11975140317
09/04/2015,11526960332
29/04/2015,9986194500
16/09/2015,11501088256
13/10/2015,11833183163
10/11/2015,13246940910
16/12/2015,13255698568
27/01/2016,13775653990
23/02/2016,13567323648
22/03/2016,14607415705
11/04/2016,13835444224
04/04/2016,14118970743"
I read this into R:
z <- read.zoo(text = Lines, sep = ",", header = TRUE, index = 1:1, tz = "", format = "%d/%m/%Y")
I wish to interpolate data such that I can convert this irregularly spaced time series into a regular one. The time interval does not matter as long as it's regular but a monthly, weekly, or bi-weekly interval would do.
How do I do this in R
or Matlab
?
Note: I realize interpolated values might not be very accurate and might misrepresent information, however I need to learn how to do this and I'm alright with loosing some accuracy.
Ok, first of all, word of warning: if you're going to interpolate and then perform tests or generic statistical estimation, your results will be (badly) biased, unless you have some good reasons (domain knowledge?) to assume that your interpolation method will generate points coming from the same distribution of the original points. And no, "the plot looks good" is not a good criterion to assess this :) That having being said, let's have a look at the data:
# Lines contains your data
library(zoo)
fmt <- "%d/%m/%Y"
z <- read.zoo(text = Lines, sep = ",", header = TRUE, index = 1:1, tz = "", format = fmt)
t <- time(z)
plot(z,type="p",xaxt="n",pch=19,col="cyan",cex=1.5)
labs <- format(t,fmt)
axis(side = 1, at = t, labels = labs,cex.axis = 0.7)
It looks most of your missing data pertain to summer 2014 and summer 2015. I'm curious to know what these data are...Anyway, it looks like that most of your data are spaced by at least 2 weeks:
diff(t)
# Time differences in days
# [1] 153 25 33 20 45 35 22 20 20 140 27 28 36 42 27 28 13 7
Thus let's interpolate to a biweekly series by creating first a dummy zoo
object:
t.biweekly <- seq(from = min(t), to=max(t),by="2 weeks")
dummy <- zoo(,t.biweekly)
Merge the dummy series with yours:
z.interpolated <- merge(z,dummy,all=TRUE)
If you look at the new series, you'll see there are NA values at all times of dummy
which don't have a corresponding time in z
. Let's fill those points with linearly interpolated values and look at the result:
z.interpolated <- na.approx(z.interpolated)
plot(z.interpolated, type = "b")
points(z,pch=19,col="cyan",cex=1.5)
Voilà! Remember that building models for inference out of this thing is a bad idea...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With