Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rolling Text Concatenation with Data.Table in R

Tags:

r

data.table

I have a dataset that looks like the following:

rownum<-c(1,2,3,4,5,6,7,8,9,10)
name<-c("jeff","jeff","mary","jeff","jeff","jeff","mary","mary","mary","mary")
text<-c("a","b","c","d","e","f","g","h","i","j")
a<-data.table(rownum,name,text)

I would like to add a new column of text that adds from the previous column by rownum and name. The vector of the new column would be:

rolltext<-c("a","ab","c","abd","abde","abdef","cg","cgh","cghi","cghij"

I am at a loss here in terms of what to do. For numbers I would just use the cumsum function, but for text I am thinking I would need a for loop or to use one of the apply functions?

like image 798
adamstorer Avatar asked Oct 31 '15 20:10

adamstorer


2 Answers

Here's an idea using substring().

a[, rolltext := substring(paste(text, collapse = ""), 1, 1:.N), by = name]

which gives

    rownum name text rolltext
 1:      1 jeff    a        a
 2:      2 jeff    b       ab
 3:      3 mary    c        c
 4:      4 jeff    d      abd
 5:      5 jeff    e     abde
 6:      6 jeff    f    abdef
 7:      7 mary    g       cg
 8:      8 mary    h      cgh
 9:      9 mary    i     cghi
10:     10 mary    j    cghij

We might be able to speed this up a bit with the stringi package

library(stringi)
a[, rolltext := stri_sub(stri_c(text, collapse = ""), length = 1:.N), by = name]
like image 140
Rich Scriven Avatar answered Nov 03 '22 15:11

Rich Scriven


You can use Reduce with the accumulate option:

a[, rolltext := Reduce(paste0, text, accumulate = TRUE), by = name]

    rownum name text rolltext
 1:      1 jeff    a        a
 2:      2 jeff    b       ab
 3:      3 mary    c        c
 4:      4 jeff    d      abd
 5:      5 jeff    e     abde
 6:      6 jeff    f    abdef
 7:      7 mary    g       cg
 8:      8 mary    h      cgh
 9:      9 mary    i     cghi
10:     10 mary    j    cghij

Alternately, as @DavidArenburg suggested, construct each row using sapply:

a[, rolltext := sapply(1:.N, function(x) paste(text[1:x], collapse = '')), by = name]

This is a running sum, while a rolling sum (in the OP's title) is something different, at least in R lingo.

like image 23
Frank Avatar answered Nov 03 '22 16:11

Frank