I'm looking for a packages to detect pattern for example seasonality. I have a dataframe with two columns: Day(Date) and Visits.
And when I plot the data I see that the visits on the website are in summer month higher than in the other months. And this pattern I can see over 10 years.
The problem is that I want to analyse the seasonality with data from hundreds of websites.
Please provide me with an example to detect this pattern on the timeseries?
Facebook released the prophet package to simplify time series analysis. There are tons of other ways to look for seasonality, but I think prophet is the easiest to use without tweaking. I recommend reading Facebook's documentation.
First let's create a sample of your data.
library(tidyverse)
website <-
tibble(date = seq(as.Date('2015/01/01'), as.Date('2017/01/01'), by = "day"),
visits = round(rnorm(732, mean = 327, sd = 100)))
Let's increase the website traffic during the summer.
library(lubridate)
website <-
mutate(website, ifelse(month(date) %in% c(6, 7, 8), visits + 10, visits))
Now for the prophet calculations!
library(prophet)
website <- website %>%
rename(ds = date, y = visits)
m <- prophet(website)
future <- make_future_dataframe(m, periods = 365)
forecast <- predict(m, future)
Visualize the results.
plot(m, forecast)

It definitely looks like there's more traffic in the summer but it's hard to be certain. Fortunately, prophet has a function to examine daily and weekly seasonality.
prophet_plot_components(m, forecast)

See that increase in the "yearly" chart? You definitely have more website traffic in the summer than you do in the rest of the year!
In response to comments, here's a quick and easy way to test for any monthly seasonality within each website. It applies an anova test to each group. This example gives website B a seasonal effect, which you can see in the statistic and p.value columns.
First create the demo data...
library(tidyverse)
library(lubridate)
library(purrr)
library(broom)
website <-
tibble(
site = c(rep("A", 732), rep("B", 732), rep("C", 732)),
date = rep(seq(
as.Date('2015/01/01'), as.Date('2017/01/01'), by = "day"
), 3),
visits = rep(round(rnorm(
732, mean = 327, sd = 100
)), 3)
) %>%
mutate(month = month(date))
website <-
mutate(website, visits = ifelse(month %in% c(6,7,8) &
site == "B", visits + 1000, visits))
Now use the wonders of the tidyverse to run the test across each group...
website %>%
split(.$site) %>%
map(~ tidy(aov(visits ~ month, data = .)))
#$A
# term df sumsq meansq statistic p.value
#1 month 1 3645.896 3645.896 0.3529069 0.5526563
#2 Residuals 730 7541662.108 10331.044 NA NA
#$B
# term df sumsq meansq statistic p.value
#1 month 1 1086355 1086355.5 5.426011 0.02011086
#2 Residuals 730 146155160 200212.5 NA NA
#$C
# term df sumsq meansq statistic p.value
#1 month 1 3645.896 3645.896 0.3529069 0.5526563
#2 Residuals 730 7541662.108 10331.044 NA NA
Note that this is not the ideal method for performing time series analysis, but it answers the specific question that you're asking.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With