Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R time series modeling on weekly data using ts() object

Tags:

r

time-series

I am trying to do time series modeling and forecasting using R based on weekly data like below:

biz week     Amount        Count 2006-12-27   973710.7     816570 2007-01-03  4503493.2    3223259 2007-01-10  2593355.9    1659136 2007-01-17  2897670.9    2127792 2007-01-24  3590427.5    2919482 2007-01-31  3761025.7    2981363 2007-02-07  3550213.1    2773988 2007-02-14  3978005.1    3219907 2007-02-21  4020536.0    3027837 2007-02-28  4038007.9    3191570 2007-03-07  3504142.2    2816720 2007-03-14  3427323.1    2703761 ... 2014-02-26  99999999.9   1234567 

About my data: As seen above, each week is labeled by first day for the week (my week starts on Wed. and ends on Tues). When I construct my ts object, I tried

ts <- ts(df, frequency=52, start=c(2007,1)) 

the problem I have is:

1) Some year may have 53 weeks, so frequency=52 will not work for those years;

2) My starting week/date is 2006-12-27, how should I set the start parameter? start=c(2006,52) or start=c(2007,1) since week of 2006-12-27 really cross the year boundary? Also, for modeling, is it better to have complete year worth of data (say for 2007 my start year if I only have partial year worth of data), is it better not to use 2007, instead to start with 2008? What about 2014: since it is not a complete year yet, should I use what I have for modeling or not? Either way, I still have an issue with whether or not to include those weeks in the year boundary like 2006-12-27. Should I include it as wk 1 for 2007 or the last week of 2006?

3) When I use ts <- ts(df, frequency=52, start=c(2007,1)) and then print it, I got the results shown below, so instead of 2007.01, 2007.02, 2007.52..., I got 2007.000, 2007.019, ..., which it gets from 1/52=0.019. This is mathematically correct but not really easy to interpret. Is there a way to label it as the date itself just like a data frame or at least 2007 wk1, 2007 wk2...

=========

Time Series: Start = c(2007, 1)  End = c(2014, 11)  Frequency = 52            Amount        Count 2007.000   645575.4     493717 2007.019  2185193.2    1659577 2007.038  1016711.8     860777 2007.058  1894056.4    1450101 2007.077  2317517.6    1757219 2007.096  2522955.8    1794512 2007.115  2266107.3    1723002  

4) My goal is to model this weekly data and then try to decompose it to see seasonal components. It seems like I have to use the ts() function to convert to a ts object sp that I can use the decompose() function. I tried xts() and I got an error stating " time series has no or less than 2 periods". I guess this is because xts() won't let me specify the frequency, right?

xts <- xts(df,order.by=businessWeekDate) 

5) I looked for the answer in this forum and other places as well; most of the examples are monthly, and though there are some weekly time series questions, none of the answers are straightforward. Hopefully somebody can help answer my questions here.

like image 291
user3281664 Avatar asked Mar 05 '14 04:03

user3281664


People also ask

How do I use the TS function in R?

Creating a time series The ts() function will convert a numeric vector into an R time series object. The format is ts(vector, start=, end=, frequency=) where start and end are the times of the first and last observation and frequency is the number of observations per unit time (1=annual, 4=quartly, 12=monthly, etc.).

How do I add weekly data in R?

The Weekly data set is found in the ISLR R package. You can load the Weekly data set in R by issuing the following command at the console data("Weekly"). This will load the data into a variable called Weekly. If R says the Weekly data set is not found, you can try installing the package by issuing this command install.

What is frequency in TS in R?

The “frequency” is the number of observations before the seasonal pattern repeats. 1. When using the ts() function in R, the following choices should be used. Data.


2 Answers

Using non-integer frequencies works quite well and is compatible with most models (auto.arima, ets, ...). For the start date, I just use the convenience functions in lubridate. The importance here is to be consistent when working with multiple time series of potentially different start and end dates.

library(lubridate) ts(df$Amount,     freq=365.25/7,     start=decimal_date(ymd("2006-12-27"))) 
like image 173
nassimhddd Avatar answered Sep 21 '22 11:09

nassimhddd


First make sure that your data has exactly 52 data per year. To do that, identify the years with 53 data and remove the one which is the less important for your seasonality pattern (for instance do not remove a week in December if you want to check the Christmas sales seasonality (!)

Xts is a good format as it is more flexible, however all the decomposition and forecasting tools usually work with ts as they require a fix number of data per cycle.

regarding your question on the non complete years. it should not be an issue. R doesn't know when is january or december, hence a year can start and end anytime.

like image 31
RockScience Avatar answered Sep 18 '22 11:09

RockScience