I have a big data set with many variables that looks similar to this :
> data.table(a=letters[1:10],b=LETTERS[1:10],ID=c(1,1,1,2,2,2,2,3,3,3))
a b ID
1: a A 1
2: b B 1
3: c C 1
4: d D 2
5: e E 2
6: f F 2
7: g G 2
8: h H 3
9: i I 3
10: j J 3
I want to concatenate(with new line character between them) all column values except ID for each value of ID, so the result should look like this :
a b ID
1: a A 1
b B
c C
2: d D 2
e E
f F
g G
3: h H 3
i I
j J
I found a link R Dataframe: aggregating strings within column, across rows, by group which talks about how to do it for one column, how to extend this for all columns in .SD ?
To make it clear I changed the separator from \n
to ,
and the result should look like :
a b ID
1: a,b,c A,B,C 1
2: d,e,f,g D,E,F,G 2
3: h,i,j H,I,J 3
You can concatenate all columns in using lapply
.
dt[, lapply(.SD, paste0, collapse=" "), by = ID]
## ID a b
## 1: 1 a b c A B C
## 2: 2 d e f g D E F G
## 3: 3 h i j H I J
Using newline characters as a ollapse argument instead of " "
does work, but does not print as you seem to expect in your desired output.
dt[, lapply(.SD, paste0, collapse="\n"), by = ID]
## ID a b
## 1: 1 a\nb\nc A\nB\nC
## 2: 2 d\ne\nf\ng D\nE\nF\nG
## 3: 3 h\ni\nj H\nI\nJ
As pointed out in the comments by @Frank, the question has been changed to have ,
as a seperator instead of \n
. Of course you can just change the collapse
argument to ","
. If you want to have a space as well ", "
, then the solution by @DavidArenburg is preferable.
dt[, lapply(.SD, paste0, collapse=","), by = ID]
dt[, lapply(.SD, toString), by = ID]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With