I have a timeseries of monthly data with lots of missing datapoints, set to NA. I want to simply subtract the annual cycle from the data, ignoring the missing entries. It seems that the decompose function can't handle missing data points, but I have seen elsewhere that the seasonal package is suggested instead. However I am also running into problems there too with the NA.
Here is a minimum reproducible example of the problem using a built in dataset...
library(seasonal)
# set range to missing NA in Co2 dataset
c2<-co2
c2[c2>330 & c2<350]=NA
seas(c2,na.action=na.omit)
Error in na.omit.ts(x) : time series contains internal NAs
Yes, I know! that's why I asked you to omit them! Let's try this:
seas(c2,na.action=na.x13)
Error: X-13 run failed
Errors:
- Adding MV1981.Apr exceeds the number of regression effects
allowed in the model (80).
Hmmm, interesting, no idea what that means, okay, please just exclude the NA:
seas(c2,na.action=na.exclude)
Error in na.omit.ts(x) : time series contains internal NAs
that didn't help much! and for good measure
decompose(c2)
Error in na.omit.ts(x) : time series contains internal NAs
I'm on the following:
R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
Why is leaving out NA such a problem? I'm obviously being completely stupid, but I can't see what I'm doing wrong with the seas function. Happy to consider an alternative solution using xts.
My first solution, simply manually calculating the seasonal cycle, converting to a dataframe to subtract the vector and then transforming back.
# seasonal cycle
scycle=tapply(c2,cycle(c2),mean,na.rm=T)
# converting to df
df=tapply(c2, list(year=floor(time(c2)), month = cycle(c2)), c)
# subtract seasonal cycle
for (i in 1:nrow(df)){df[i,]=df[i,]-scycle}
# convert back to timeseries
anomco2=ts(c(t(df)),start=start(c2),freq=12)
Not very pretty, and not very efficient either.
The comment of missuse lead me to another Seasonal decompose of monthly data including NA in r I missed with a near duplicate question and this suggested the package zoo, which seems to work really well for additive series
library(zoo)
c2=co2
c2[c2>330&c2<350]=NA
d=decompose(na.StructTS(c2))
plot(co2)
lines(d$x,col="red")
shows that the series is very well reconstructed through the missing period.
The output of deconstruct has the trend and seasonal cycle available. I wish I could transfer my bounty to user https://stackoverflow.com/users/516548/g-grothendieck for this helpful response. Thanks to user missuse too.
However, if the missing portion is at the end of the series, the software has to extrapolate the trend and has more difficulties. The original series (in black) maintains the trend, while the trend is smaller in the reconstructed series (red):
c2=co2
c2[c2>350]=NA
d=decompose(na.StructTS(c2))
plot(co2)
lines(d$x,col="red")
Lastly, if instead the missing portion is at the start of the series, the software is unable to extrapolate backwards in time and throws an error... I feel another SO question coming on...
c2=co2
c2[c2<330]=NA
d=decompose(na.StructTS(c2))
Error in StructTS(y) :
the first value of the time series must not be missing
You could just use some algorithm that fills the missing data before. (e.g. from package imputeTS or zoo)
imputeTS for example has extra imputation algorithms for seasonal time series e.g.:
x <- na_seadec(co2)
Another good option for seasonal data:
x <- na_kalman(co2)
And now just go on without the missing data.
An important hint from Adrian Tompkins (see also comment below): This will work best, when the missing data is somewhere in the middle. For a lot of leading NAs the method is no good choice. In this case it fills the NAs, but it is not able to extrapolate the trend backwards:
c2<-co2
c2[c2<330]<-NA
c3<-na_kalman(c2)
c4<-na_seadec(c2)
plot(co2)
lines(c3,col="blue")
lines(c4,col="red")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With