Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulatively paste (concatenate) values grouped by another variable

Tags:

dataframe

r

I have a problem dealing with a data frame in R. I would like to paste the contents of cells in different rows together based on the values of the cells in another column. My problem is that I want the output to be progressively (cumulatively) printed. The output vector must be of the same length as the input vector. Here is a sampel table similar to the one I am dealing with:

id <- c("a", "a", "a", "b", "b", "b")
content <- c("A", "B", "A", "B", "C", "B")
(testdf <- data.frame(id, content, stringsAsFactors=FALSE))
#  id content
#1  a       A
#2  a       B
#3  a       A
#4  b       B
#5  b       C
#6  b       B

And this is want I want the result to look like:

result <- c("A", "A B", "A B A", "B", "B C", "B C B") 
result

#[1] "A"     "A B"   "A B A" "B"     "B C"   "B C B"

What I do NOT need something like this:

ddply(testdf, .(id), summarize, content_concatenated = paste(content, collapse = " "))

#  id content_concatenated
#1  a                A B A
#2  b                B C B
like image 654
user3860074 Avatar asked Jul 21 '14 09:07

user3860074


1 Answers

data.table solution

library(data.table)
setDT(testdf)[, content2 := sapply(seq_len(.N), function(x) paste(content[seq_len(x)], collapse = " ")), by = id]
testdf

##    id content content2
## 1:  a       A        A
## 2:  a       B      A B
## 3:  a       A    A B A
## 4:  b       B        B
## 5:  b       C      B C
## 6:  b       B    B C B
like image 159
David Arenburg Avatar answered Oct 23 '22 08:10

David Arenburg