Suppose that I have this input :
ID date_1 date_2 str
1 1 2010-07-04 2008-01-20 A
2 2 2015-07-01 2011-08-31 C
3 3 2015-03-06 2013-01-18 D
4 4 2013-01-10 2011-08-30 D
5 5 2014-06-04 2011-09-18 B
6 5 2014-06-04 2011-09-18 B
7 6 2012-11-22 2011-09-28 C
8 7 2014-06-17 2013-08-04 A
10 7 2014-06-17 2013-08-04 B
11 7 2014-06-17 2013-08-04 B
I would like to progressively concatenate the values of the str
column by the group variable ID
, as showed in the following output :
ID date_1 date_2 str
1 1 2010-07-04 2008-01-20 A
2 2 2015-07-01 2011-08-31 C
3 3 2015-03-06 2013-01-18 D
4 4 2013-01-10 2011-08-30 D
5 5 2014-06-04 2011-09-18 B
6 5 2014-06-04 2011-09-18 B,B
7 6 2012-11-22 2011-09-28 C
8 7 2014-06-17 2013-08-04 A
10 7 2014-06-17 2013-08-04 A,B
11 7 2014-06-17 2013-08-04 A,B,B
I tried to use the ave()
function with this code :
within(table, {
Emp_list <- ave(str, ID, FUN = function(x) paste(x, collapse = ","))
})
but it gives the following output, which is not exactly what I want :
ID date_1 date_2 str
1 1 2010-07-04 2008-01-20 A
2 2 2015-07-01 2011-08-31 C
3 3 2015-03-06 2013-01-18 D
4 4 2013-01-10 2011-08-30 D
5 5 2014-06-04 2011-09-18 B,B
6 5 2014-06-04 2011-09-18 B,B
7 6 2012-11-22 2011-09-28 C
8 7 2014-06-17 2013-08-04 A,B,B
10 7 2014-06-17 2013-08-04 A,B,B
11 7 2014-06-17 2013-08-04 A,B,B
Of course I'd like to avoid loops, as I work on a large database.
How about ave()
with Reduce()
. The Reduce()
function allows us to accumulate results as they are calculated. So if we run it with paste()
we can accumulate the pasted strings.
f <- function(x) {
Reduce(function(...) paste(..., sep = ", "), x, accumulate = TRUE)
}
df$str <- with(df, ave(as.character(str), ID, FUN = f)
which gives the updated data frame df
ID date_1 date_2 str
1 1 2010-07-04 2008-01-20 A
2 2 2015-07-01 2011-08-31 C
3 3 2015-03-06 2013-01-18 D
4 4 2013-01-10 2011-08-30 D
5 5 2014-06-04 2011-09-18 B
6 5 2014-06-04 2011-09-18 B, B
7 6 2012-11-22 2011-09-28 C
8 7 2014-06-17 2013-08-04 A
10 7 2014-06-17 2013-08-04 A, B
11 7 2014-06-17 2013-08-04 A, B, B
Note: function(...) paste(..., sep = ", ")
could also be function(x, y) paste(x, y, sep = ", ")
. (Thanks Pierre Lafortune)
Here's a possible solution combining data.table
with an inner tapply
that seem to get you what you need (you can use paste
instead of toString
if you like, it just looks cleaner to me that way).
library(data.table)
setDT(df)[, Str := tapply(str[sequence(1:.N)], rep(1:.N, 1:.N), toString), by = ID]
df
# ID date_1 date_2 str Str
# 1: 1 2010-07-04 2008-01-20 A A
# 2: 2 2015-07-01 2011-08-31 C C
# 3: 3 2015-03-06 2013-01-18 D D
# 4: 4 2013-01-10 2011-08-30 D D
# 5: 5 2014-06-04 2011-09-18 B B
# 6: 5 2014-06-04 2011-09-18 B B, B
# 7: 6 2012-11-22 2011-09-28 C C
# 8: 7 2014-06-17 2013-08-04 A A
# 9: 7 2014-06-17 2013-08-04 B A, B
# 10: 7 2014-06-17 2013-08-04 B A, B, B
You may be able to improve it a bit using
setDT(df)[, Str := {Len <- 1:.N ; tapply(str[sequence(Len)], rep(Len, Len), toString)}, by = ID]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With