I often want to perform tidyr::spread
and dplyr::summarise
in a "single step" to aggregate data by group. What I want is shown in expected
. I can get expected
by performing summarise
and spread
separately and combine the results with a dplyr::full_join
but I'm looking for alternative approaches that avoid full_join. Bona fide single-step approaches are not necessary.
df <- data.frame(
id = rep(letters[1], 2),
val1 = c(10, 20),
val2 = c(100, 200),
key = c("A", "B"),
value = c(1, 2))
library(tidyverse)
result1 <- df %>%
group_by(id) %>%
summarise(
val1 = min(val1),
val2 = max(val2)
)
# A tibble: 1 x 3
# id val1 val2
# <fctr> <dbl> <dbl>
# 1 a 10.0 200
result2 <- df %>%
select(id, key, value) %>%
group_by(id) %>%
spread(key, value)
# A tibble: 1 x 3
# Groups: id [1]
# id A B
# * <fctr> <dbl> <dbl>
# 1 a 1.00 2.00
expected <- full_join(result1, result2, by="id")
# A tibble: 1 x 5
# id val1 val2 A B
# <fctr> <dbl> <dbl> <dbl> <dbl>
# 1 a 10.0 200 1.00 2.00
I suspect your data may have more edge cases that require some modification, but why don't you simply spread
then summarise
? You can specify the summary function separately per variable, so for A
and B
where you don't actually need to calculate anything (I'm assuming) you can just remove all the NA
:
df %>%
spread("key", "value") %>%
group_by(id) %>%
summarise(
val1 = min(val1),
val2 = max(val2),
A = mean(A, na.rm = TRUE),
B = mean(B, na.rm = TRUE)
)
# A tibble: 1 x 5
id val1 val2 A B
<fct> <dbl> <dbl> <dbl> <dbl>
1 a 10.0 200 1.00 2.00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With