I am having difficulty replicating the functionality of a typical SQL GROUP_CONCAT function in dplyr. I would also like to make sure the ordering inside the groups can be controlled. Ideally I want to use the hadleyverse/tidyverse but base R or other packages will work too.
Example data:
ID name
1 apple
1 orange
2 orange
3 orange
3 apple
Desired output:
ID name
1 apple,orange
2 orange
3 apple,orange
Note that for ID=3, the ordering is in alpha order, not how the rows are ordered. I think this can probably be handled by doing an arrange
first, but it would be nice to control inside the summarise
statement or the like.
In R
, we can use one of the group by operations.
library(dplyr)
df1 %>%
group_by(ID) %>%
summarise(name = toString(sort(unique(name))))
# ID name
# <int> <chr>
#1 1 apple, orange
#2 2 orange
#3 3 apple, orange
Or using data.table
library(data.table)
setDT(df1)[, .(name = toString(sort(unique(name)))), by = ID]
# ID name
#1: 1 apple, orange
#2: 2 orange
#3: 3 apple, orange
For base R use this
aggregate(data=df,name~ID,FUN = function(t) sort(paste(t)))
Data
df<-read.table(header = T,text = "ID name
1 apple
1 orange
2 orange
3 orange
3 apple")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With