I have a data frame with 2 columns, ID number and brand:
X1 X2
1234 A89
1234 A87
1234 A87
1234 A32
1234 A27
1234 A27
1235 A12
1235 A14
1235 A14
1236 A32
1236 A32
1236 A27
1236 A12
1236 A12
1236 A14
1236 A89
1236 A87
1237 A99
1237 A98
I want to find the top 3 brand combinations that occur together most frequently with regard to id number:
A89, A87
A32, A27
A12, A14
I tried: library(dplyr)
df %>%
group_by(X1,X2) %>%
mutate(n = n()) %>%
group_by(X1) %>%
slice(which.max(n)) %>%
select(-n)
But it doesn't work correctly. I would appreciate any thoughts or ideas!
Here's a way to do it in base R. We split X2
by X1
and then get combination of two values for each subgroup. Then we grab the three most common ones.
with(data.frame(table(unlist(lapply(split(df$X2, df$X1), function(x)
combn(unique(x), min(2, length(x)), paste, collapse = "-"))))),
as.character(Var1[head(order(Freq, decreasing = TRUE), 3)]))
#[1] "A12-A14" "A32-A27" "A89-A87"
DATA
df = structure(list(X1 = c(1234L, 1234L, 1234L, 1234L, 1234L, 1234L,
1235L, 1235L, 1235L, 1236L, 1236L, 1236L, 1236L, 1236L, 1236L,
1236L, 1236L, 1237L, 1237L), X2 = c("A89", "A87", "A87", "A32",
"A27", "A27", "A12", "A14", "A14", "A32", "A32", "A27", "A12",
"A12", "A14", "A89", "A87", "A99", "A98")), .Names = c("X1",
"X2"), class = "data.frame", row.names = c(NA, -19L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With