Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Daily time series with ts.. how to specify start and end [closed]

Tags:

r

time-series

I have a daily time series that begins on Saturday and ends on Wednesday. There is a clear weekly period to it. It is stored in a vector a in R. So, I try and convert it into a time series object -

ts(a,frequency=7)

This gives me -

Time Series:
Start = c(1, 1) 
End = c(13, 5) 

What do the (1,1) and (13,5) mean? And what is the best way to specify start and end in this scenario. All the examples on the internet deal with yearly data, not daily.

like image 271
Rohit Pandey Avatar asked May 12 '14 08:05

Rohit Pandey


1 Answers

Let's explore how ts works with different frequencies using the documentation (?ts)

Let's say this is your data

dat <- data.frame(myts = sample(10, 24, replace = T),
                  Date = seq(as.Date("2008-10-11"), as.Date("2008-10-11") + 23, by = 1))

# myts       Date
# 1     6 2008-10-11
# 2     9 2008-10-12
# 3     6 2008-10-13
# 4     9 2008-10-14
# 5     8 2008-10-15
# 6     6 2008-10-16
# 7     1 2008-10-17
# 8     9 2008-10-18
# 9     3 2008-10-19
# 10    5 2008-10-20
# 11    7 2008-10-21
# 12    4 2008-10-22
# 13    2 2008-10-23
# 14    9 2008-10-24
# 15    5 2008-10-25
# 16    9 2008-10-26
# 17    7 2008-10-27
# 18    8 2008-10-28
# 19    7 2008-10-29
# 20    2 2008-10-30
# 21    6 2008-10-31
# 22    6 2008-11-01
# 23    8 2008-11-02
# 24    1 2008-11-03

Let's compare outputs for different frequencies on same data and some arbitrary start point

print(ts(dat$myts, frequency = 7, start = c(1950, 3)), calendar = T)
#      p1 p2 p3 p4 p5 p6 p7
# 1950        6  9  6  9  8
# 1951  6  1  9  3  5  7  4
# 1952  2  9  5  9  7  8  7
# 1953  2  6  6  8  1      
print(ts(dat$myts, frequency = 12, start = c(1950, 3)), calendar = T)
#      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 1950           6   9   6   9   8   6   1   9   3   5
# 1951   7   4   2   9   5   9   7   8   7   2   6   6
# 1952   8   1                                        
print(ts(dat$myts, frequency = 4, start = c(1950, 3)), calendar = T)
#      Qtr1 Qtr2 Qtr3 Qtr4
# 1950              6    9
# 1951    6    9    8    6
# 1952    1    9    3    5
# 1953    7    4    2    9
# 1954    5    9    7    8
# 1955    7    2    6    6
# 1956    8    1          
print(ts(dat$myts, frequency = 7), calendar = T)
#   p1 p2 p3 p4 p5 p6 p7
# 1  6  9  6  9  8  6  1
# 2  9  3  5  7  4  2  9
# 3  5  9  7  8  7  2  6
# 4  6  8  1    

We can learn 3 things from the outputs

1- ts is familiar with 12 and 4 frequencies and identifies them as months and quarters, while it's prints the 7 frequency in a not so straightforward way.

2- The first number in the start parameter is the number of the period depending on the frequency, while the second number is the first incident in that period (as not all series begin at January or at Sunday).

3- When you are not specifying the start point, the ts function assumes that you are starting from the beginning of the first period (thus the (1,1) in your example)

Now, in order to make this time series more meaningful for you, you could potentially compute the week number of the year (as we usually have about 52 weeks an a year) and the day number of your first observation (e.g.: 1 = Sunday, 2 = Monday, etc.) and parse them into the start parameter (see ?strftime)

startW <- as.numeric(strftime(head(dat$Date, 1), format = "%W"))
startD <- as.numeric(strftime(head(dat$Date, 1) + 1, format =" %w")) 
print(ts(dat$myts, frequency = 7, start = c(startW, startD)), calendar = T)
#   p1 p2 p3 p4 p5 p6 p7
#39                    6
#40  9  6  9  8  6  1  9
#41  3  5  7  4  2  9  5
#42  9  7  8  7  2  6  6
#43  8  1   

Which means that our first observation (which occurred in 2008-10-11) was Saturday of the 39th week of 2008

like image 193
David Arenburg Avatar answered Nov 16 '22 02:11

David Arenburg