Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

collapsing rows in a data frame while merging another column value

I have this data sets

d1 = data.frame(PatID=c(1,1,1,2,2,4,1,2), 
                code1=c(1,2,3,1,2,7,2,8), 
                location=c('a','b','c','d','e','f','g','h'))

I want to eliminate duplicate rows (on PatID) so

  1. I get only one row for each unique PatID;
  2. merge values for code1 for all common rows,
  3. retain location for any one matching row (for first or last row - does not matter).

Output should be:

PatID    code1    location 
1        1,2,3    a 
2        1,2,8    d 
4        7        f 

I have tried unsuccessfully aggregate, ddply and even struggled with melt dcast. I am a former unix programmer but new to r.

like image 220
s dscientist Avatar asked Dec 30 '25 17:12

s dscientist


2 Answers

ddply works just fine:

ddply(d1,.(PatID),
      summarise,
      code1 = paste(unique(code1),collapse = ','),
      location = location[1])

  PatID code1 location
1     1 1,2,3        a
2     2 1,2,8        d
3     4     7        f

Oh fine. Here's the data.table version:

d2 <- as.data.table(d1)
> d2[,list(code1 = paste(unique(code1),collapse = ','),location = location[1]),by = 'PatID']
   PatID code1 location
1:     1 1,2,3        a
2:     2 1,2,8        d
3:     4     7        f
like image 100
joran Avatar answered Jan 02 '26 08:01

joran


Just to make sure base R isn't entirely ignored (or to make you appreciate the syntax of "plyr" and "data.table" for these types of problems)...

Two options:

Option 1: Use ave to do the "aggregation" and unique to slim down the output

unique(within(d1, {
  code1 <- ave(code1, PatID, 
               FUN=function(x) paste(unique(x), collapse = ","))
  location <- ave(location, PatID, FUN=function(x) x[1])
}))
#   PatID code1 location
# 1     1 1,2,3        a
# 4     2 1,2,8        d
# 6     4     7        f

Option 2: Get aggregate and merge working together

merge(
  aggregate(code1 ~ PatID, d1, 
          function(x) paste(unique(x), collapse = ",")),
  aggregate(location ~ PatID, d1, function(x) x[1]))
#   PatID code1 location
# 1     1 1,2,3        a
# 2     2 1,2,8        d
# 3     4     7        f

The closest purely aggregate solution I can think of is as follows:

aggregate(cbind(code1, as.character(location)) ~ PatID, d1, 
          function(x) cbind(paste(unique(x), collapse = ","),
                            as.character(x[1])))
#   PatID code1.1 code1.2    V2.1 V2.2
# 1     1   1,2,3       1 a,b,c,g    a
# 2     2   1,2,8       1   d,e,h    d
# 3     4       7       7       f    f

It gives you all the information you were interested in, and a decent amount of information you weren't interested in too...

like image 21
A5C1D2H2I1M1N2O1R2T1 Avatar answered Jan 02 '26 10:01

A5C1D2H2I1M1N2O1R2T1



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!