I have a dataframe where one of the columns contains dates (some dates appear multiple times). I want to aggregate the dates by week. The best way I can think of this is to round down the dates to the nearest Monday. How can I round down dates? How can I transform this list of dates into weeks?
2016-04-04
2016-04-05
2016-04-06
2016-04-07
2016-04-08
2016-04-09
2016-04-10
2016-04-11
2016-04-12
2016-04-13
2016-04-14
Expected output should be this:
2016-04-04
2016-04-04
2016-04-04
2016-04-04
2016-04-04
2016-04-04
2016-04-04
2016-04-11
2016-04-11
2016-04-11
2016-04-11
With the week_start
parameter in the floor_date
function of the lubridate
package you have the option to specify the beginning of the week since lubridate version 1.7.0. This allows you to perform:
library(lubridate)
dates <- seq.Date(as.Date("2016-04-04"), as.Date("2016-04-14"), by = 1)
floor_date(dates, "weeks", week_start = 1)
I would post it as a comment to Sraffa's response but I don't have the reputation.
cut()
from base R has two methods for objects of class Date
and POSIXt
which assume that weeks start on Monday by default (but may be changed to Sunday using start.on.monday = FALSE
).
dates <- c("2016-04-04", "2016-04-05", "2016-04-06", "2016-04-07", "2016-04-08",
"2016-04-09", "2016-04-10", "2016-04-11", "2016-04-12", "2016-04-13",
"2016-04-14")
result <- data.frame(
dates,
cut_Date = cut(as.Date(dates), "week"),
cut_POSIXt = cut(as.POSIXct(dates), "week"),
stringsAsFactors = FALSE)
result
# dates cut_Date cut_POSIXt
#1 2016-04-04 2016-04-04 2016-04-04
#2 2016-04-05 2016-04-04 2016-04-04
#3 2016-04-06 2016-04-04 2016-04-04
#4 2016-04-07 2016-04-04 2016-04-04
#5 2016-04-08 2016-04-04 2016-04-04
#6 2016-04-09 2016-04-04 2016-04-04
#7 2016-04-10 2016-04-04 2016-04-04
#8 2016-04-11 2016-04-11 2016-04-11
#9 2016-04-12 2016-04-11 2016-04-11
#10 2016-04-13 2016-04-11 2016-04-11
#11 2016-04-14 2016-04-11 2016-04-11
Note that cut()
returns factors which is perfect for aggregation as requested by the OP:
str(result)
#'data.frame': 11 obs. of 3 variables:
# $ dates : chr "2016-04-04" "2016-04-05" "2016-04-06" "2016-04-07" ...
# $ cut_Date : Factor w/ 2 levels "2016-04-04","2016-04-11": 1 1 1 1 1 1 1 2 2 2 ...
# $ cut_POSIXt: Factor w/ 2 levels "2016-04-04","2016-04-11": 1 1 1 1 1 1 1 2 2 2 ...
However, for plotting aggregated values with ggplot2
(and if there is a large number of weeks which might clutter the axis) it might be better to switch from a discrete time scale to a continuous time scale. Then it is necessary to coerce factors back to Date
or POSIXct
:
as.Date(as.character(result$cut_Date))
as.POSIXct(as.character(result$cut_Date))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With