Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a column indicating shared unique cluster ID in R

Tags:

r

I'd like to create a column that gives a unique CoupleID if ID and PartnerID contain the same values but in any combination, i.e. not in the same columns. The existing questions and answers seem to only refer to cases when the values are duplicated within the same columns, i.e Add ID column by group. Any help would be much appreciated!

This is what I have:

> tibble(df)
# A tibble: 6 × 2
  ID    PartnerID
   1       2        
   2       1        
   3       4        
   4       3        
   5       6        
   6       5      

This is what I want:

> tibble(df2)
# A tibble: 6 × 3
  ID    PartnerID CoupleID
   1       2         1       
   2       1         1       
   3       4         2       
   4       3         2       
   5       6         3       
   6       5         3

Data

df <- data.frame (ID  = c("1", "2", "3", "4", "5", "6"),
                  PartnerID = c("2", "1", "4","3", "6", "5")
)

df2 <- data.frame (ID  = c("1", "2", "3", "4", "5", "6"),
                  PartnerID = c("2", "1", "4","3", "6", "5"),
                  CoupleID = c("1", "1", "2", "2", "3", "3")
)
like image 568
squaregrace Avatar asked Nov 28 '25 18:11

squaregrace


1 Answers

Try this

library(dplyr)

df |> rowwise() |> mutate(g = paste0(sort(c_across(ID:PartnerID)) ,
collapse = "")) |> group_by(g) |> mutate(CoupleID = cur_group_id()) |>
ungroup() |> select(-g)
  • output
# A tibble: 6 × 3
  ID    PartnerID CoupleID
  <chr> <chr>        <int>
1 1     2                1
2 2     1                1
3 3     4                2
4 4     3                2
5 5     6                3
6 6     5                3
like image 181
Mohamed Desouky Avatar answered Nov 30 '25 09:11

Mohamed Desouky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!