I have a daily time series that begins on Saturday and ends on Wednesday. There is a clear weekly period to it. It is stored in a vector a in R. So, I try and convert it into a time series object -
ts(a,frequency=7)
This gives me -
Time Series:
Start = c(1, 1)
End = c(13, 5)
What do the (1,1) and (13,5) mean? And what is the best way to specify start and end in this scenario. All the examples on the internet deal with yearly data, not daily.
Let's explore how ts
works with different frequencies using the documentation (?ts
)
Let's say this is your data
dat <- data.frame(myts = sample(10, 24, replace = T),
Date = seq(as.Date("2008-10-11"), as.Date("2008-10-11") + 23, by = 1))
# myts Date
# 1 6 2008-10-11
# 2 9 2008-10-12
# 3 6 2008-10-13
# 4 9 2008-10-14
# 5 8 2008-10-15
# 6 6 2008-10-16
# 7 1 2008-10-17
# 8 9 2008-10-18
# 9 3 2008-10-19
# 10 5 2008-10-20
# 11 7 2008-10-21
# 12 4 2008-10-22
# 13 2 2008-10-23
# 14 9 2008-10-24
# 15 5 2008-10-25
# 16 9 2008-10-26
# 17 7 2008-10-27
# 18 8 2008-10-28
# 19 7 2008-10-29
# 20 2 2008-10-30
# 21 6 2008-10-31
# 22 6 2008-11-01
# 23 8 2008-11-02
# 24 1 2008-11-03
Let's compare outputs for different frequencies on same data and some arbitrary start point
print(ts(dat$myts, frequency = 7, start = c(1950, 3)), calendar = T)
# p1 p2 p3 p4 p5 p6 p7
# 1950 6 9 6 9 8
# 1951 6 1 9 3 5 7 4
# 1952 2 9 5 9 7 8 7
# 1953 2 6 6 8 1
print(ts(dat$myts, frequency = 12, start = c(1950, 3)), calendar = T)
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 1950 6 9 6 9 8 6 1 9 3 5
# 1951 7 4 2 9 5 9 7 8 7 2 6 6
# 1952 8 1
print(ts(dat$myts, frequency = 4, start = c(1950, 3)), calendar = T)
# Qtr1 Qtr2 Qtr3 Qtr4
# 1950 6 9
# 1951 6 9 8 6
# 1952 1 9 3 5
# 1953 7 4 2 9
# 1954 5 9 7 8
# 1955 7 2 6 6
# 1956 8 1
print(ts(dat$myts, frequency = 7), calendar = T)
# p1 p2 p3 p4 p5 p6 p7
# 1 6 9 6 9 8 6 1
# 2 9 3 5 7 4 2 9
# 3 5 9 7 8 7 2 6
# 4 6 8 1
We can learn 3 things from the outputs
1- ts
is familiar with 12 and 4 frequencies and identifies them as months and quarters, while it's prints the 7 frequency in a not so straightforward way.
2- The first number in the start
parameter is the number of the period depending on the frequency, while the second number is the first incident in that period (as not all series begin at January or at Sunday).
3- When you are not specifying the start point, the ts
function assumes that you are starting from the beginning of the first period (thus the (1,1)
in your example)
Now, in order to make this time series more meaningful for you, you could potentially compute the week number of the year (as we usually have about 52 weeks an a year) and the day number of your first observation (e.g.: 1 = Sunday, 2 = Monday, etc.) and parse them into the start
parameter (see ?strftime
)
startW <- as.numeric(strftime(head(dat$Date, 1), format = "%W"))
startD <- as.numeric(strftime(head(dat$Date, 1) + 1, format =" %w"))
print(ts(dat$myts, frequency = 7, start = c(startW, startD)), calendar = T)
# p1 p2 p3 p4 p5 p6 p7
#39 6
#40 9 6 9 8 6 1 9
#41 3 5 7 4 2 9 5
#42 9 7 8 7 2 6 6
#43 8 1
Which means that our first observation (which occurred in 2008-10-11) was Saturday of the 39th week of 2008
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With