Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding most frequent combinations

Tags:

r

I have a data frame with 2 columns, ID number and brand:

X1     X2
1234   A89
1234   A87
1234   A87
1234   A32
1234   A27
1234   A27
1235   A12
1235   A14
1235   A14
1236   A32
1236   A32
1236   A27
1236   A12
1236   A12
1236   A14
1236   A89
1236   A87
1237   A99
1237   A98

I want to find the top 3 brand combinations that occur together most frequently with regard to id number:

A89, A87
A32, A27
A12, A14

I tried: library(dplyr)

 df %>% 
  group_by(X1,X2) %>%
  mutate(n = n()) %>%
  group_by(X1) %>%
  slice(which.max(n)) %>%
  select(-n)

But it doesn't work correctly. I would appreciate any thoughts or ideas!

like image 832
anrpet Avatar asked Mar 08 '23 10:03

anrpet


1 Answers

Here's a way to do it in base R. We split X2 by X1 and then get combination of two values for each subgroup. Then we grab the three most common ones.

with(data.frame(table(unlist(lapply(split(df$X2, df$X1), function(x)
    combn(unique(x), min(2, length(x)), paste, collapse = "-"))))),
    as.character(Var1[head(order(Freq, decreasing = TRUE), 3)]))
#[1] "A12-A14" "A32-A27" "A89-A87"

DATA

df = structure(list(X1 = c(1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 
1235L, 1235L, 1235L, 1236L, 1236L, 1236L, 1236L, 1236L, 1236L, 
1236L, 1236L, 1237L, 1237L), X2 = c("A89", "A87", "A87", "A32", 
"A27", "A27", "A12", "A14", "A14", "A32", "A32", "A27", "A12", 
"A12", "A14", "A89", "A87", "A99", "A98")), .Names = c("X1", 
"X2"), class = "data.frame", row.names = c(NA, -19L))
like image 78
d.b Avatar answered Mar 28 '23 08:03

d.b