Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregating by unique identifier and concatenating related values into a string [duplicate]

I have a need that I imagine could be satisfied by aggregate or reshape, but I can't quite figure out.

I have a list of names (brand), and accompanying ID number (id). This data is in long form, so names can have multiple ID's. I'd like to de-dupicate by the name (brand) and concatenate the multiple possible id's into a string separated by a comment.

For example:

brand            id 
RadioShack       2308
Rag & Bone       4466
Ragu             1830
Ragu             4518
Ralph Lauren     1638
Ralph Lauren     2719
Ralph Lauren     2720
Ralph Lauren     2721
Ralph Lauren     2722 

should become:

RadioShack       2308
Rag & Bone       4466
Ragu             1830,4518
Ralph Lauren     1638,2719,2720,2721,2722

How would I accomplish this?

like image 310
roody Avatar asked May 16 '13 20:05

roody


3 Answers

Let's call your data.frame DF

> aggregate(id ~ brand, data = DF, c)
         brand                           id
1   RadioShack                         2308
2   Rag & Bone                         4466
3         Ragu                   1830, 4518
4 Ralph Lauren 1638, 2719, 2720, 2721, 2722

Another alternative using aggregate is:

result <- aggregate(id ~ brand, data = DF, paste, collapse = ",")

This produces the same result and now id is not a list anymore. Thanks to @Frank comment. To see the class of each column try:

> sapply(result, class)
      brand          id 
   "factor" "character"

As mentioned by @DavidArenburg in the comments, another alternative is using the toString function:

aggregate(id ~ brand, data = DF, toString)
like image 197
Jilber Urbina Avatar answered Nov 03 '22 14:11

Jilber Urbina


A nice clean one line in data.table

library(data.table)
setDT(DF)

TWO OPTIONS:

results as a list

DF[ , .(id = list(id)), by = brand]
          brand                       id
1:   RadioShack                     2308
2:   Rag & Bone                     4466
3:         Ragu                1830,4518
4: Ralph Lauren 1638,2719,2720,2721,2722
> 

results as a string

DF[ , .(id = paste(id, collapse=",")), by = brand]
          brand                       id
1:   RadioShack                     2308
2:   Rag & Bone                     4466
3:         Ragu                1830,4518
4: Ralph Lauren 1638,2719,2720,2721,2722

Note

Even though the two results appear the same (that is when you print them, they look identical), they are in fact very different and allow for different functionality.

Namely, using the list option (the first one) allows you to then perform functions on the orignal ids.

The latter will allow you to display the information more easily (including exporting to CSV or excel), but to operate on the id's will require splicing them back.

like image 44
Ricardo Saporta Avatar answered Nov 03 '22 14:11

Ricardo Saporta


Or using dplyr:

library(dplyr)
DF %>%
  group_by(brand) %>%
  summarise(id = paste(id, collapse = ","))

Where DF is the name of your data.frame.

like image 32
Sam Firke Avatar answered Nov 03 '22 14:11

Sam Firke