I have a data frame that has similar structure to the following:
set.seed(123)
df<-data_frame(SectionName = rep(letters[1:2], 50),
TimeSpentSeconds = sample(0:360, 100, replace = TRUE),
Correct = sample(0:1, 100, replace = TRUE))
I want summarise this data frame by taking all values of TimeSpentSeconds that fall into certain ranges (less than 30, between 30-60, between 60-90, ..., greater than 180), label the times as those ranges, group them by SectionName, and find the sum of the Correct column so that the resulting data frame looks (something) like this:
TimeGroup SectionName Correct
<fct> <chr> <int>
1 LessThan30Secs a 2
2 LessThan30Secs b 3
3 30-60 Seconds a 4
4 30-60 Seconds b 3
5 60-90 Seconds a 2
6 60-90 Seconds b 3
7 90-120 Seconds a 4
8 90-120 Seconds b 0
9 120-150 Seconds a 4
10 120-150 Seconds b 0
11 150-180 Seconds a 1
12 150-180 Seconds b 2
13 GreaterThan180Seconds a 11
14 GreaterThan180Seconds b 11
I was able to successfully do this with the following if-else code where I mutated all of the times into a new column with the appropriate label, grouped, and summarised:
x <- c("LessThan30Secs", "30-60 Seconds", "60-90 Seconds","90-120 Seconds",
"120-150 Seconds", "150-180 Seconds", "GreaterThan180Seconds")
df %>%
mutate(TimeGroup = if_else(TimeSpentSeconds >= 0 & TimeSpentSeconds <= 30, "LessThan30Secs",
if_else(TimeSpentSeconds > 30 & TimeSpentSeconds <= 60, "30-60 Seconds",
if_else(TimeSpentSeconds > 60 & TimeSpentSeconds <= 90, "60-90 Seconds",
if_else(TimeSpentSeconds > 90 & TimeSpentSeconds <= 120, "90-120 Seconds",
if_else(TimeSpentSeconds > 120 & TimeSpentSeconds <= 150, "120-150 Seconds",
if_else(TimeSpentSeconds > 150 & TimeSpentSeconds <= 180, "150-180 Seconds",
if_else(TimeSpentSeconds > 180, "GreaterThan180Seconds", "")))))))) %>%
mutate(TimeGroup = factor(TimeGroup, levels = x)) %>%
arrange(TimeGroup) %>%
group_by(TimeGroup, SectionName) %>%
summarise(Correct = sum(Correct))
But, there just has to be a better way to do this. I considered writing a function, but didn't get very far as I'm not great at function writing.
Does anyone have any ideas on a more elegant way to accomplish this same output through a dplyr method I didn't think of, writing a custom function maybe utilizing the purrr package at some point, or some other r function?
case_when()
will do what you want. Its a tidy alternative to nested ifelse()
statements.
library(dplyr)
mutate(df,
TimeGroup = case_when(
TimeSpentSeconds <= 30 ~ "30 Seconds or less",
TimeSpentSeconds <= 60 ~ "31-60 Seconds",
TimeSpentSeconds <= 90 ~ "61-90 Seconds",
TimeSpentSeconds <= 120 ~ "91-120 Seconds",
TimeSpentSeconds <= 150 ~ "121-150 Seconds",
TimeSpentSeconds <= 180 ~ "151-180 Seconds",
TimeSpentSeconds > 180 ~ "Greater Than 180 Seconds",
TRUE ~ NA_character_)
)
The last argument is a catch all for records that don't fit any of the criteria, such as if time is somehow less than 0 seconds.
We can do this easily with cut
(or findInterval
) instead of multiple nested ifelse
statements
lbls <- c('LessThan30secs', '30-60 Seconds', '60-90 Seconds',
'90-120 Seconds', '120-150 Seconds', '150-180 Seconds', 'GreaterThan180Seconds')
df %>%
group_by(TimeGroup = cut(TimeSpentSeconds,
breaks = c(seq(0, 180, by = 30), Inf), labels = lbls),
SectionName) %>%
summarise(Correct = sum(Correct)) %>%
na.omit
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With