I am trying to calculate the frequency of bikes that are taken by people using a dataset provided by Leada.
Here is the code:
library(dplyr)
setAs("character", "POSIXlt", function(from) strptime(from, format = "%m/%d/%y %H:%M"))
d <- read.csv("http://mandrillapp.com/track/click/30315607/s3-us-west-1.amazonaws.com?p=eyJzIjoiemxlVjNUREczQ2l5UFVPeEFCalNUdmlDYTgwIiwidiI6MSwicCI6IntcInVcIjozMDMxNTYwNyxcInZcIjoxLFwidXJsXCI6XCJodHRwczpcXFwvXFxcL3MzLXVzLXdlc3QtMS5hbWF6b25hd3MuY29tXFxcL2RhdGF5ZWFyXFxcL2Jpa2VfdHJpcF9kYXRhLmNzdlwiLFwiaWRcIjpcImEyODNiNjMzOWJkOTQxMGM5ZjlkYzE0MmQ0NDQ5YmU4XCIsXCJ1cmxfaWRzXCI6W1wiMTVlYzMzNWM1NDRlMTM1ZDI0YjAwODE4ZjI5YTdkMmFkZjU2NWQ2MVwiXX0ifQ",
colClasses = c("numeric", "numeric", "POSIXlt", "factor", "numeric", "POSIXlt", "factor", "numeric", "numeric", "factor", "character"),
stringsAsFactors = T)
names(d)[9] <- "BikeNo"
d <- tbl_df(d)
d <- d %>% mutate(Weekday = factor(weekdays(Start.Date)))
d %>% group_by(Weekday)
%>% summarise(Total = n())
%>% select(Weekday, Total)
It is strange but dplyr does not want to group data by Weekday saying:
Error: column 'Start.Date' has unsupported type
Why it cares about Start.Date column where I group by a factor? You can run the code locally to reproduce the error: it will download the data automatically.
P.S. I am using dplyr version: dplyr_0.3.0.2
The lubridate package is useful when dealing with dates. Here is the code to parse Start.Date and End.Date, extract week days, then group by week days:
library(dplyr)
library(lubridate)
# For some reason your instruction to load the csv directly from a url
# didn't work. I save the csv to a temporary directory.
d <- read.csv("/tmp/bike_trip_data.csv", colClasses = c("numeric", "numeric", "character", "factor", "numeric", "character", "factor", "numeric", "numeric", "factor", "character"), stringsAsFactors = T)
names(d)[9] <- "BikeNo"
d <- tbl_df(d)
d <- d %>%
mutate(
Start.Date = parse_date_time(Start.Date,"%m/%d/%y %H:%M"),
End.Date = parse_date_time(End.Date,"%m/%d/%y %H:%M"),
Weekday = wday(Start.Date, label=TRUE, abbr=FALSE))
d %>%
group_by(Weekday) %>%
summarise(Total = n())
# Weekday Total
# 1 Sunday 10587
# 2 Monday 23138
# 3 Tuesday 24678
# 4 Wednesday 23651
# 5 Thursday 25265
# 6 Friday 24283
# 7 Saturday 12413
I am sorry if this issue is long forgotten, but it weirds me out to see everyone recommending to convert to POSIX.ct or character when I have been using the much simpler solution of calling the arrange function from the plyr package using plyr::arrange
, as it doesn't seem to have issues with the POSIXlt formats. As I am usually not the one finding the easiest solution for a problem in R, I am starting to think that there is something wrong with it. Does it not do the same as the dplyr version?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With