Create a column indicating shared unique cluster ID in R

Question

I'd like to create a column that gives a unique CoupleID if ID and PartnerID contain the same values but in any combination, i.e. not in the same columns. The existing questions and answers seem to only refer to cases when the values are duplicated within the same columns, i.e Add ID column by group. Any help would be much appreciated!

This is what I have:

> tibble(df)
# A tibble: 6 × 2
  ID    PartnerID
   1       2        
   2       1        
   3       4        
   4       3        
   5       6        
   6       5

This is what I want:

> tibble(df2)
# A tibble: 6 × 3
  ID    PartnerID CoupleID
   1       2         1       
   2       1         1       
   3       4         2       
   4       3         2       
   5       6         3       
   6       5         3

Data

df <- data.frame (ID  = c("1", "2", "3", "4", "5", "6"),
                  PartnerID = c("2", "1", "4","3", "6", "5")
)

df2 <- data.frame (ID  = c("1", "2", "3", "4", "5", "6"),
                  PartnerID = c("2", "1", "4","3", "6", "5"),
                  CoupleID = c("1", "1", "2", "2", "3", "3")
)

Mohamed Desouky · Accepted Answer

Try this

library(dplyr)

df |> rowwise() |> mutate(g = paste0(sort(c_across(ID:PartnerID)) ,
collapse = "")) |> group_by(g) |> mutate(CoupleID = cur_group_id()) |>
ungroup() |> select(-g)

output

# A tibble: 6 × 3
  ID    PartnerID CoupleID
  <chr> <chr>        <int>
1 1     2                1
2 2     1                1
3 3     4                2
4 4     3                2
5 5     6                3
6 6     5                3

Create a column indicating shared unique cluster ID in R

Tags:

r

squaregrace

1 Answers

Mohamed Desouky

Recent Activity

Donate For Us

Create a column indicating shared unique cluster ID in R

Tags:

r

squaregrace

1 Answers

Mohamed Desouky

Related questions

Recent Activity

Donate For Us