Consider the following example
library(tidyverse)
library(lubridate)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week = wday(time, label = TRUE))
Source: local data frame [25 x 3]
time values day_of_week
<date> <dbl> <fctr>
1 2014-02-24 30 Mon
2 2014-02-25 45 Tues
3 2014-02-26 30 Wed
4 2014-02-27 50 Thurs
5 2014-02-28 50 Fri
6 2014-03-01 20 Sat
7 2014-03-02 35 Sun
8 2014-03-03 50 Mon
9 2014-03-04 35 Tues
10 2014-03-05 35 Wed
I would like to aggregate this dataframe by week.
That is, suppose I define a week as starting on Monday morning and ending on Sunday evening, which we will call a Monday to Monday
cycle. (importantly, I want to be able to choose other conventions, such as Friday to Friday for instance).
Then, I would simply like to count the mean of values
for each week.
For instance, in the example above, one would compute the average of values
between Monday February 24th to Sunday March 2nd, and so on.
How can I do that?
In the tidyverse,
df2 %>% group_by(week = week(time)) %>% summarise(value = mean(values))
## # A tibble: 5 × 2
## week value
## <dbl> <dbl>
## 1 8 37.50000
## 2 9 38.57143
## 3 10 38.57143
## 4 11 36.42857
## 5 12 45.00000
Or use isoweek
instead:
df2 %>% group_by(week = isoweek(time)) %>% summarise(value = mean(values))
## # A tibble: 4 × 2
## week value
## <int> <dbl>
## 1 9 37.14286
## 2 10 40.71429
## 3 11 35.00000
## 4 12 42.50000
Or cut.Date
:
df2 %>% group_by(week = cut(time, "week")) %>% summarise(value = mean(values))
## # A tibble: 4 × 2
## week value
## <fctr> <dbl>
## 1 2014-02-24 37.14286
## 2 2014-03-03 40.71429
## 3 2014-03-10 35.00000
## 4 2014-03-17 42.50000
which you can tell to start on Sunday, if you prefer:
df2 %>% group_by(week = cut(time, "week", start.on.monday = FALSE)) %>%
summarise(value = mean(values))
## # A tibble: 4 × 2
## week value
## <fctr> <dbl>
## 1 2014-02-23 37.50000
## 2 2014-03-02 40.00000
## 3 2014-03-09 33.57143
## 4 2014-03-16 44.00000
If you want to shift to, say, Tuesday start, add one to your dates:
df2 %>% group_by(week = cut(time + 1, "week")) %>% summarise(value = mean(values))
## # A tibble: 4 × 2
## week value
## <fctr> <dbl>
## 1 2014-02-24 37.50000
## 2 2014-03-03 40.00000
## 3 2014-03-10 33.57143
## 4 2014-03-17 44.00000
Labels will be off, though. If using cut
, consider the implications of its include.lowest
and right
parameters, documented at ?cut
.
why not straight up use floor_date
and an integer to adjust the start date of the week?
library(lubridate)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week = weekdays(time))
# week wednesday to tuesday
df2 %>% group_by(Week = floor_date(time-3, unit="week")) %>%
summarize(WeeklyAveDist=mean(values), mean(values), min_date = min(time), max_date = max(time)) %>% mutate(weekdays(min_date), weekdays(max_date)))
Week WeeklyAveDist mean.values. min_date max_date
1 2014-02-16 37.50000 37.50000 2014-02-24 2014-02-25
2 2014-02-23 38.57143 38.57143 2014-02-26 2014-03-04
3 2014-03-02 38.57143 38.57143 2014-03-05 2014-03-11
4 2014-03-09 36.42857 36.42857 2014-03-12 2014-03-18
5 2014-03-16 45.00000 45.00000 2014-03-19 2014-03-20
weekdays.min_date. weekdays.max_date.
1 Monday Tuesday
2 Wednesday Tuesday
3 Wednesday Tuesday
4 Wednesday Tuesday
5 Wednesday Thursday
# Week Thursday to Wednesday
df2 %>% group_by(Week = floor_date(time-4, unit="week")) %>%
summarize(WeeklyAveDist=mean(values), mean(values), min_date = min(time), max_date = max(time)) %>% mutate(weekdays(min_date), weekdays(max_date)))
Week WeeklyAveDist mean.values. min_date max_date
1 2014-02-16 35.00000 35.00000 2014-02-24 2014-02-26
2 2014-02-23 39.28571 39.28571 2014-02-27 2014-03-05
3 2014-03-02 37.14286 37.14286 2014-03-06 2014-03-12
4 2014-03-09 40.00000 40.00000 2014-03-13 2014-03-19
5 2014-03-16 40.00000 40.00000 2014-03-20 2014-03-20
weekdays.min_date. weekdays.max_date.
1 Monday Wednesday
2 Thursday Wednesday
3 Thursday Wednesday
4 Thursday Wednesday
5 Thursday Thursday
aggregate(df2$values,by=list(week(df2$time)),mean)
Group.1 x 1 8 30.00000 2 9 40.00000 3 10 36.42857 4 11 37.85714 5 12 43.33333
This uses the week
function of lubridate and gives the week number of the week in the year.
To control which day of the week is the starting day just refer to this thread on that topic:
Changing lubridate function to start on Monday rather than Sunday
The solution from that thread by nograpes suggests that if you want a custom version of the week()
function using an arbitrary day of the week as the beginning of the week that you just construct it from base R like this:
start.of.week <- function(date) date - (setNames(c(6,0:5),0:6) [strftime(date,'%w')]) end.of.week <- function(date) date + (setNames(c(0,6:1),0:6) [strftime(date,'%w')]) start.of.week(as.Date(c('2014-01-05','2014-10-02','2014-09-22','2014-09-27'))) # "2013-12-30" "2014-09-29" "2014-09-22" "2014-09-22" end.of.week(as.Date(c('2014-01-05','2014-10-02','2014-09-22','2014-09-27'))) # "2014-01-05" "2014-10-05" "2014-09-28" "2014-09-28"
In the future lubridate
will have this option for an arbitrary start day for weeks, but Hadley hasn't got around to adding it yet (https://github.com/hadley/lubridate/issues/257).
Just this once, after some research, I actually think I came up with a better solution that
Example below for weeks starting on a thursday. The weeks will be labeled by their first day a given cycle.
library(tidyverse)
library(lubridate)
options(tibble.print_min = 30)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week_label = wday(time, label = TRUE),
day_of_week = wday(time, label = FALSE))
df2 <- df2 %>% mutate(thursday_cycle = time - ((as.integer(day_of_week) - 5) %% 7),
tmp_1 = (as.integer(day_of_week) - 5),
tmp_2 = ((as.integer(day_of_week) - 5) %% 7))
which gives
> df2
# A tibble: 25 × 7
time values day_of_week_label day_of_week thursday_cycle tmp_1 tmp_2
<date> <dbl> <ord> <dbl> <date> <dbl> <dbl>
1 2014-02-24 30 Mon 2 2014-02-20 -3 4
2 2014-02-25 45 Tues 3 2014-02-20 -2 5
3 2014-02-26 30 Wed 4 2014-02-20 -1 6
4 2014-02-27 50 Thurs 5 2014-02-27 0 0
5 2014-02-28 50 Fri 6 2014-02-27 1 1
6 2014-03-01 20 Sat 7 2014-02-27 2 2
7 2014-03-02 35 Sun 1 2014-02-27 -4 3
8 2014-03-03 50 Mon 2 2014-02-27 -3 4
9 2014-03-04 35 Tues 3 2014-02-27 -2 5
10 2014-03-05 35 Wed 4 2014-02-27 -1 6
11 2014-03-06 50 Thurs 5 2014-03-06 0 0
12 2014-03-07 35 Fri 6 2014-03-06 1 1
13 2014-03-08 40 Sat 7 2014-03-06 2 2
14 2014-03-09 40 Sun 1 2014-03-06 -4 3
15 2014-03-10 20 Mon 2 2014-03-06 -3 4
16 2014-03-11 50 Tues 3 2014-03-06 -2 5
17 2014-03-12 25 Wed 4 2014-03-06 -1 6
18 2014-03-13 20 Thurs 5 2014-03-13 0 0
19 2014-03-14 30 Fri 6 2014-03-13 1 1
20 2014-03-15 50 Sat 7 2014-03-13 2 2
21 2014-03-16 50 Sun 1 2014-03-13 -4 3
22 2014-03-17 40 Mon 2 2014-03-13 -3 4
23 2014-03-18 40 Tues 3 2014-03-13 -2 5
24 2014-03-19 50 Wed 4 2014-03-13 -1 6
25 2014-03-20 40 Thurs 5 2014-03-20 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With