I am looking for a way to do what would be the equivalent of a cumulative sum in R for string/character-formatted text instead of numbers. The different text fields should be concatenated.
E.g. in the data frame "df":
Column A contains the input, column B the desired result.
A B
1 banana banana
2 boats banana boats
3 are banana boats are
4 awesome banana boats are awesome
Currently I am solving this via the following loop
df$B <- ""
for(i in 1:nrow(df)) {
if (length(df[i-1,"A"]) > 0) {
df$B[i] <- paste(df$B[i-1],df$A[i])
} else {
df$B[i] <- df$A[i]
}
}
I wonder whether there exists a more elegant/faster solution.
(df$B <- Reduce(paste, as.character(df$A), accumulate = TRUE))
# [1] "banana" "banana boats" "banana boats are" "banana boats are awesome"
We can try
i1 <- sequence(seq_len(nrow(df1)))
tapply(df1$A[i1], cumsum(c(TRUE,diff(i1) <=0)),
FUN= paste, collapse=' ')
Or
i1 <- rep(seq(nrow(df1)), seq(nrow(df1)))
tapply(i1, i1, FUN= function(x)
paste(df1$A[seq_along(x)], collapse=' ') )
I don't know if it's faster, but at least the code is shorter:
sapply(seq_along(df$A),function(x){paste(A[1:x], collapse=" ")})
Thanks to Rolands comment, I realised that this was one of the rare occurences where a for-loop could be useful, as it saves us the repeated indexing. It differs from OP's as it starts at 2, saving the need for the if statment inside the forloop.
res <- c(NA, length(df1$A))
res[1] <- as.character(df1$A[1])
for(i in 2:length(df1$A)){
res[i] <- paste(res[i-1],df1$A[i])
}
res
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With