Say with the following data, I am interested in the question of how many unique partners each fruit has?
my df:
fruit1 fruit2
1 guava kiwi
2 lemon pear
3 pear apple
4 guava kiwi
5 pear guava
6 apple kiwi
7 banana lemon
8 lemon kiwi
9 apple banana
10 lemon guava
I am trying to get to grips with dplyr and tidyr. To do this, I thought it would be good to use the n_distinct()
in dplyr. I did the following:
rbind (df %>%select(fruita=fruit1,fruitb=fruit2),
df %>%select(fruita=fruit2,fruitb=fruit1)) %>%
group_by(fruita) %>%
summarise(Partners=n_distinct(fruitb)) %>%
arrange(desc(Partners))
This essentially duplicates the 10 rows underneath but switches the order of the fruits in the bottom half. I then count for each fruit in the new first column, how many unique partner fruits it has in the new second column using n_distinct()
.
This works fine, but given how elegant dplyr
and tidyr
are, I am wondering if there is a more efficient way of doing this, and especially if there is a way of performing an rbind
such as this using one of these packages?
The final data look like this:
fruita Partners
1 lemon 4
2 apple 3
3 guava 3
4 pear 3
5 kiwi 3
6 banana 2
data for reproducing:
structure(list(fruit1 = structure(c(3L, 4L, 5L, 3L, 5L, 1L, 2L,
4L, 1L, 4L), .Label = c("apple", "banana", "guava", "lemon",
"pear"), class = "factor"), fruit2 = structure(c(4L, 6L, 1L,
4L, 3L, 4L, 5L, 4L, 2L, 3L), .Label = c("apple", "banana", "guava",
"kiwi", "lemon", "pear"), class = "factor")), .Names = c("fruit1",
"fruit2"), class = "data.frame", row.names = c(NA, -10L))
Not sure if this helps:
df %>%
do(data.frame(fruita=unlist(.), fruitb=unlist(.[,2:1]))) %>%
group_by(fruita) %>%
summarise(Partners=n_distinct(fruitb)) %>%
arrange(desc(Partners))
#Source: local data frame [6 x 2]
# fruita Partners
# 1 lemon 4
# 2 apple 3
# 3 guava 3
# 4 pear 3
# 5 kiwi 3
# 6 banana 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With