My data table has the following format
ID   Var1   Var2   Var3   ...
1_1  0      0      1      ...
1_2  1      1      0      ...
1_3  0      0      1      ...
...  ...    ...    ...    ...
I want to extract the ID's from unique combinations (Varcolumns). Getting the unique combinations is not the problem (plyr::count(), aggregate() etc), I want to extract the id variables contributing to these unique combinations. 
The output should look somewhat like this
Var1   Var2   Var3   IDs
0      0      1      1_1, 1_3
1      1      0      1_2
where the IDs column is a vector/list of all the ID's contributing to a unique combination. 
I tried an R package and dplyr pipelines, nothing worked so far. 
Any suggestions or even R packages how to handle this task?
Thank you!
You can use group_by_at with the pattern that matches your column names, and summarise, i.e.
df %>% 
 group_by_at(vars(contains('Var'))) %>% 
 summarise(IDs = toString(ID))
which gives,
# A tibble: 2 x 4 # Groups: Var1, Var2 [2] Var1 Var2 Var3 IDs <int> <int> <int> <chr> 1 0 0 1 1_1, 1_3 2 1 1 0 1_2
df %>% group_by_at(.vars=-1) %>% summarize(IDs=list(ID))
Similar to Sotos' solution, but simplifies selection of the ID column assuming all other columns need to be unique, and IDs column will be a column of lists rather than a string.
# A tibble: 2 x 4
# Groups:   Var1, Var2 [2]
   Var1  Var2  Var3 IDs      
  <int> <int> <int> <list>   
1     0     0     1 <chr [2]>
2     1     1     0 <chr [1]>
Just for fun, you can further simplify it using tidyr's nest function:
require(tidyr)
nest(df,IDs=ID)
# A tibble: 2 x 4
   Var1  Var2  Var3 IDs                
  <int> <int> <int> <S3: vctrs_list_of>
1     0     0     1 1_1, 1_3           
2     1     1     0 1_2   
This still leaves IDs as a list, which may or may not be useful for you, but displays it more clearly in the tibble. An extra benefit of keeping the column as a list rather than a string is that you can easily recreate the original table using unnest:
unnest(nest(dd,IDs=ID),cols=IDs)
# A tibble: 3 x 4
   Var1  Var2  Var3 ID   
  <int> <int> <int> <chr>
1     0     0     1 1_1  
2     0     0     1 1_3  
3     1     1     0 1_2  
                        Using aggregate and unique
aggregate(dat$ID,list(dat$Var1,dat$Var2,dat$Var3),unique)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With