I'm not sure what this problem is even called. Let's say I'm counting distinct combinations of 2 columns, but I want distinct across the order of the two columns. Here's what I mean:
df = data.frame(fruit1 = c("apple", "orange", "orange", "banana", "kiwi"),
fruit2 = c("orange", "apple", "banana", "orange", "apple"),
stringsAsFactors = FALSE)
# What I want: total number of fruit combinations, regardless of
# which fruit comes first and which second.
# Eg 2 apple-orange, 2 banana-orange, 1 kiwi-apple
# What I know *doesn't* work:
table(df$fruit1, df$fruit2)
# What *does* work:
library(dplyr)
df %>% group_by(fruit1, fruit2) %>%
transmute(fruitA = sort(c(fruit1, fruit2))[1],
fruitB = sort(c(fruit1, fruit2))[2]) %>%
group_by(fruitA, fruitB) %>%
summarise(combinations = n())
I've got a way to make this work, as you can see, but is there a name for this general problem? It's sort of a combinatorics problem but counting, not generating combinations. And what if I had three or four columns of similar type? The above method is poorly generalizable. Tidyverse approaches most welcome!
By using apply
and sort
order your dataframe then we just using group_by
count
data.frame(t(apply(df,1,sort)))%>%group_by_all(.)%>%count()
# A tibble: 3 x 3
# Groups: X1, X2 [3]
X1 X2 n
<fctr> <fctr> <int>
1 apple kiwi 1
2 apple orange 2
3 banana orange 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With