return ID's of unique combinations

Question

My data table has the following format

ID   Var1   Var2   Var3   ...
1_1  0      0      1      ...
1_2  1      1      0      ...
1_3  0      0      1      ...
...  ...    ...    ...    ...

I want to extract the ID's from unique combinations (Varcolumns). Getting the unique combinations is not the problem (plyr::count(), aggregate() etc), I want to extract the id variables contributing to these unique combinations.

The output should look somewhat like this

Var1   Var2   Var3   IDs
0      0      1      1_1, 1_3
1      1      0      1_2

where the IDs column is a vector/list of all the ID's contributing to a unique combination.

I tried an R package and dplyr pipelines, nothing worked so far.

Any suggestions or even R packages how to handle this task?

Thank you!

Sotos · Accepted Answer

You can use group_by_at with the pattern that matches your column names, and summarise, i.e.

df %>% 
 group_by_at(vars(contains('Var'))) %>% 
 summarise(IDs = toString(ID))

which gives,

# A tibble: 2 x 4
# Groups:   Var1, Var2 [2]
   Var1  Var2  Var3 IDs     
  <int> <int> <int> <chr>   
1     0     0     1 1_1, 1_3
2     1     1     0 1_2

iod · Answer

df %>% group_by_at(.vars=-1) %>% summarize(IDs=list(ID))

Similar to Sotos' solution, but simplifies selection of the ID column assuming all other columns need to be unique, and IDs column will be a column of lists rather than a string.

# A tibble: 2 x 4
# Groups:   Var1, Var2 [2]
   Var1  Var2  Var3 IDs      
  <int> <int> <int> <list>   
1     0     0     1 <chr [2]>
2     1     1     0 <chr [1]>

Just for fun, you can further simplify it using tidyr's nest function:

require(tidyr)
nest(df,IDs=ID)
# A tibble: 2 x 4
   Var1  Var2  Var3 IDs                
  <int> <int> <int> <S3: vctrs_list_of>
1     0     0     1 1_1, 1_3           
2     1     1     0 1_2

This still leaves IDs as a list, which may or may not be useful for you, but displays it more clearly in the tibble. An extra benefit of keeping the column as a list rather than a string is that you can easily recreate the original table using unnest:

unnest(nest(dd,IDs=ID),cols=IDs)
# A tibble: 3 x 4
   Var1  Var2  Var3 ID   
  <int> <int> <int> <chr>
1     0     0     1 1_1  
2     0     0     1 1_3  
3     1     1     0 1_2

user2974951 · Answer

Using aggregate and unique

aggregate(dat$ID,list(dat$Var1,dat$Var2,dat$Var3),unique)

return ID's of unique combinations

Tags:

r

data.table

dplyr

tebiwankenebi

3 Answers

Sotos

iod

user2974951

Recent Activity

Donate For Us

return ID's of unique combinations

Tags:

r

data.table

dplyr

tebiwankenebi

3 Answers

Sotos

iod

user2974951

Related questions

Recent Activity

Donate For Us