Aggregation by time period in lubridate

Q: What is Lubridate package in R?

Lubridate is an R package that makes it easier to work with dates and times. Below is a concise tour of some of the things lubridate can do for you. Lubridate was created by Garrett Grolemund and Hadley Wickham, and is now maintained by Vitalie Spinu.

Tags:

r

lubridate

This question asks about aggregation by time period in R, what pandas calls resampling. The most useful answer uses the XTS package to group by a given time period, applying some function such as sum() or mean().

One of the comments suggested there was something similar in lubridate, but didn't elaborate. Can someone provide an idiomatic example using lubridate? I've read through the lubridate vignette a couple times and can imagine some combination of lubridate and plyr, however I want to make sure there isn't an easier way that I'm missing.

To make the example more real, let's say I want the daily sum of bicycles traveling northbound from this dataset:

library(lubridate)
library(reshape2)

bikecounts <- read.csv(url("http://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD"), header=TRUE, stringsAsFactors=FALSE)
names(bikecounts) <- c("Date", "Northbound", "Southbound")

Data looks like this:

> head(bikecounts)
                    Date Northbound Southbound
1 10/02/2012 12:00:00 AM          0          0
2 10/02/2012 01:00:00 AM          0          0
3 10/02/2012 02:00:00 AM          0          0
4 10/02/2012 03:00:00 AM          0          0
5 10/02/2012 04:00:00 AM          0          0
6 10/02/2012 05:00:00 AM          0          0

468

asked Aug 04 '13 18:08

Peter

1 Answers

I don't know why you'd use lubridate for this. If you're just looking for something less awesome than xts you could try this

tapply(bikecounts$Northbound, as.Date(bikecounts$Date, format="%m/%d/%Y"), sum)

Basically, you just need to split by Date, then apply a function.

lubridate could be used for creating a grouping factor for split-apply problems. So, for example, if you want the sum for each month (ignoring year)

tapply(bikecounts$Northbound, month(mdy_hms(bikecounts$Date)), sum)

But, it's just using wrappers for base R functions, and in the case of the OP, I think the base R function as.Date is the easiest (as evidenced by the fact that the other Answers also ignored your request to use lubridate ;-) ).

Something that wasn't covered by the Answer to the other Question linked to in the OP is split.xts. period.apply splits an xts at endpoints and applies a function to each group. You can find endpoints that are useful for a given task with the endpoints function. For example, if you have an xts object, x, then endpoints(x, "months") would give you the row numbers that are the last row of each month. split.xts leverages that to split an xts object -- split(x, "months") would return a list of xts objects where each component was for a different month.

Although, split.xts() and endpoints() are primarily intended for xts objects, they also work on some other objects as well, including plain time based vectors. Even if you don't want to use xts objects, you still may find uses for endpoints() because of its convenience or its speed (implemented in C)

> split.xts(as.Date("1970-01-01") + 1:10, "weeks")
[[1]]
[1] "1970-01-02" "1970-01-03" "1970-01-04"

[[2]]
[1] "1970-01-05" "1970-01-06" "1970-01-07" "1970-01-08" "1970-01-09"
[6] "1970-01-10" "1970-01-11"

> endpoints(as.Date("1970-01-01") + 1:10, "weeks")
[1]  0  3 10

I think lubridate's best use in this problem is for parsing the "Date" strings into POSIXct objects. i.e. the mdy_hms function in this case.

Here's an xts solution that uses lubridate to parse the "Date" strings.

x <- xts(bikecounts[, -1], mdy_hms(bikecounts$Date))
period.apply(x, endpoints(x, "days"), sum)
apply.daily(x, sum) # identical to above

For this specific task, xts also has an optimized period.sum function (written in Fortran) that is very fast

period.sum(x, endpoints(x, "days"))

110

answered Nov 07 '22 01:11

GSee

Related questions
                            
                                Sampling from a Binomial(K, p) with unexpected result
                            
                                How to perform a vector overlay of two SpatialPolygonsDataFrame objects?
                            
                                data.table NOT summarizing properly by two columns
                            
                                update table in postgresql database through r
                            
                                Adjusting font.lab to get bold in plotmath expression?
                            
                                Extracting PCA axes for further analysis
                            
                                How to put a column as row names in a Dataframe
                            
                                Show methods associated with a particular class
                            
                                Extract all words between two specific words in a character vector
                            
                                Predict.glm not predicting missing values in response
                            
                                Multiple regexpr in one string in R
                            
                                Call R functions from sqldf queries
                            
                                Collecting an unknown number of results in a loop
                            
                                getting predictor names in R regression
                            
                                Saving and loading history automatically
                            
                                counting occurrences in column and create variable in R
                            
                                avoid checking examples for R package building using devtools
                            
                                Error when exporting dataframe to text file in R
                            
                                String split in R with complex divisions
                            
                                Subset all levels of a single factor

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With