I have pairs representing genetically identical individuals, in a table. I will use letters for the pairs. For example, a, x, y and b are the same individual!
Mate1 Mate2
a x
a y
b y
c z
d l
d j
d m
j n
f o
f p
f q
f r
As you can see, Mate1 can have multiple matches in Mate2, and vice versa. I would like to obtain this:
Mate1 Mate2 Mate3 Mate4 Mate5
a x y b
c z
d l m j n
f o p q r
The idea is: I want one row per group of individuals, but sometimes this involves linking pairs by Mate1 or by Mate2, several times. Example: a is linked to b by the intermediate of y. In my real dataset, you could have potentially many more intermediates like y. I would like all of them to be in one row (or adding a new column with a 'group' ID if it is easier).
Any ideas of how to do that? Many thanks!
I tried already lots of combinations of tidyverse functions like spread, unite, group by etc but without success. I struggle to get something robust and complete.
You can use the igraph
package for this task:
sort(clusters(graph.data.frame(df, directed = FALSE))$membership)
a b x y c z d j l m n f o p q r
1 1 1 1 2 2 3 3 3 3 3 4 4 4 4 4
If you want to further match your desired output, you can add dplyr
and tidyr
:
pairs <- sort(clusters(graph.data.frame(df, directed = FALSE))$membership)
pairs %>%
enframe() %>%
group_by(value) %>%
mutate(variable = paste0("Mate", 1:n())) %>%
ungroup() %>%
spread(variable, name) %>%
select(-value)
Mate1 Mate2 Mate3 Mate4 Mate5
<chr> <chr> <chr> <chr> <chr>
1 a b x y <NA>
2 c z <NA> <NA> <NA>
3 d j l m n
4 f o p q r
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With