R: counting distinct combinations found in a data frame where columns are interchangable

Question

I'm not sure what this problem is even called. Let's say I'm counting distinct combinations of 2 columns, but I want distinct across the order of the two columns. Here's what I mean:

df = data.frame(fruit1 = c("apple", "orange", "orange", "banana", "kiwi"),
                fruit2 = c("orange", "apple", "banana", "orange", "apple"),
                stringsAsFactors = FALSE)

# What I want: total number of fruit combinations, regardless of 
# which fruit comes first and which second.
# Eg 2 apple-orange, 2 banana-orange, 1 kiwi-apple

# What I know *doesn't* work:

table(df$fruit1, df$fruit2) 

# What *does* work:
library(dplyr)
df %>% group_by(fruit1, fruit2) %>% 
  transmute(fruitA = sort(c(fruit1, fruit2))[1],
            fruitB = sort(c(fruit1, fruit2))[2]) %>%
  group_by(fruitA, fruitB) %>%
  summarise(combinations = n())

I've got a way to make this work, as you can see, but is there a name for this general problem? It's sort of a combinatorics problem but counting, not generating combinations. And what if I had three or four columns of similar type? The above method is poorly generalizable. Tidyverse approaches most welcome!

BENY · Accepted Answer

By using apply and sort order your dataframe then we just using group_by count

data.frame(t(apply(df,1,sort)))%>%group_by_all(.)%>%count()
# A tibble: 3 x 3
# Groups:   X1, X2 [3]
      X1     X2     n
  <fctr> <fctr> <int>
1  apple   kiwi     1
2  apple orange     2
3 banana orange     2

R: counting distinct combinations found in a data frame where columns are interchangable

Tags:

r

dplyr

combinations

Joy

1 Answers

BENY

Recent Activity

Donate For Us

R: counting distinct combinations found in a data frame where columns are interchangable

Tags:

r

dplyr

combinations

Joy

1 Answers

BENY

Related questions

Recent Activity

Donate For Us