Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Round down dates to first day of the week

Tags:

r

I have a dataframe where one of the columns contains dates (some dates appear multiple times). I want to aggregate the dates by week. The best way I can think of this is to round down the dates to the nearest Monday. How can I round down dates? How can I transform this list of dates into weeks?

2016-04-04
2016-04-05
2016-04-06
2016-04-07
2016-04-08
2016-04-09
2016-04-10
2016-04-11
2016-04-12
2016-04-13
2016-04-14

Expected output should be this:

2016-04-04
2016-04-04
2016-04-04
2016-04-04
2016-04-04
2016-04-04
2016-04-04
2016-04-11
2016-04-11
2016-04-11
2016-04-11
like image 368
Jaol Avatar asked May 05 '17 20:05

Jaol


Video Answer


2 Answers

With the week_startparameter in the floor_date function of the lubridate package you have the option to specify the beginning of the week since lubridate version 1.7.0. This allows you to perform:

library(lubridate)
dates <- seq.Date(as.Date("2016-04-04"), as.Date("2016-04-14"), by = 1)
floor_date(dates, "weeks", week_start = 1)

I would post it as a comment to Sraffa's response but I don't have the reputation.

like image 80
Patrick Glettig Avatar answered Oct 07 '22 18:10

Patrick Glettig


cut() from base R has two methods for objects of class Date and POSIXt which assume that weeks start on Monday by default (but may be changed to Sunday using start.on.monday = FALSE).

dates <- c("2016-04-04", "2016-04-05", "2016-04-06", "2016-04-07", "2016-04-08", 
           "2016-04-09", "2016-04-10", "2016-04-11", "2016-04-12", "2016-04-13", 
           "2016-04-14")
result <- data.frame(
  dates,
  cut_Date = cut(as.Date(dates), "week"),
  cut_POSIXt = cut(as.POSIXct(dates), "week"),
  stringsAsFactors = FALSE)

result
#        dates   cut_Date cut_POSIXt
#1  2016-04-04 2016-04-04 2016-04-04
#2  2016-04-05 2016-04-04 2016-04-04
#3  2016-04-06 2016-04-04 2016-04-04
#4  2016-04-07 2016-04-04 2016-04-04
#5  2016-04-08 2016-04-04 2016-04-04
#6  2016-04-09 2016-04-04 2016-04-04
#7  2016-04-10 2016-04-04 2016-04-04
#8  2016-04-11 2016-04-11 2016-04-11
#9  2016-04-12 2016-04-11 2016-04-11
#10 2016-04-13 2016-04-11 2016-04-11
#11 2016-04-14 2016-04-11 2016-04-11

Note that cut() returns factors which is perfect for aggregation as requested by the OP:

str(result)
#'data.frame':  11 obs. of  3 variables:
# $ dates     : chr  "2016-04-04" "2016-04-05" "2016-04-06" "2016-04-07" ...
# $ cut_Date  : Factor w/ 2 levels "2016-04-04","2016-04-11": 1 1 1 1 1 1 1 2 2 2 ...
# $ cut_POSIXt: Factor w/ 2 levels "2016-04-04","2016-04-11": 1 1 1 1 1 1 1 2 2 2 ...

However, for plotting aggregated values with ggplot2 (and if there is a large number of weeks which might clutter the axis) it might be better to switch from a discrete time scale to a continuous time scale. Then it is necessary to coerce factors back to Date or POSIXct:

as.Date(as.character(result$cut_Date))
as.POSIXct(as.character(result$cut_Date))
like image 44
Uwe Avatar answered Oct 07 '22 16:10

Uwe