Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Progressive concatenation of a column by a group [duplicate]

Suppose that I have this input :

             ID     date_1      date_2     str
1            1    2010-07-04  2008-01-20   A
2            2    2015-07-01  2011-08-31   C
3            3    2015-03-06  2013-01-18   D
4            4    2013-01-10  2011-08-30   D
5            5    2014-06-04  2011-09-18   B
6            5    2014-06-04  2011-09-18   B
7            6    2012-11-22  2011-09-28   C
8            7    2014-06-17  2013-08-04   A
10           7    2014-06-17  2013-08-04   B
11           7    2014-06-17  2013-08-04   B

I would like to progressively concatenate the values of the str column by the group variable ID, as showed in the following output :

             ID     date_1      date_2     str
1            1    2010-07-04  2008-01-20   A
2            2    2015-07-01  2011-08-31   C
3            3    2015-03-06  2013-01-18   D
4            4    2013-01-10  2011-08-30   D
5            5    2014-06-04  2011-09-18   B
6            5    2014-06-04  2011-09-18   B,B
7            6    2012-11-22  2011-09-28   C
8            7    2014-06-17  2013-08-04   A
10           7    2014-06-17  2013-08-04   A,B
11           7    2014-06-17  2013-08-04   A,B,B

I tried to use the ave() function with this code :

within(table, {
  Emp_list <- ave(str, ID, FUN = function(x) paste(x, collapse = ","))
})

but it gives the following output, which is not exactly what I want :

         ID      date_1     date_2      str
1         1    2010-07-04 2008-01-20     A
2         2    2015-07-01 2011-08-31     C
3         3    2015-03-06 2013-01-18     D
4         4    2013-01-10 2011-08-30     D
5         5    2014-06-04 2011-09-18     B,B
6         5    2014-06-04 2011-09-18     B,B
7         6    2012-11-22 2011-09-28     C
8         7    2014-06-17 2013-08-04     A,B,B
10        7    2014-06-17 2013-08-04     A,B,B
11        7    2014-06-17 2013-08-04     A,B,B

Of course I'd like to avoid loops, as I work on a large database.

like image 599
Liam S. Avatar asked Jan 13 '16 22:01

Liam S.


2 Answers

How about ave() with Reduce(). The Reduce() function allows us to accumulate results as they are calculated. So if we run it with paste() we can accumulate the pasted strings.

f <- function(x) {
    Reduce(function(...) paste(..., sep = ", "), x, accumulate = TRUE)
}

df$str <- with(df, ave(as.character(str), ID, FUN = f)

which gives the updated data frame df

   ID     date_1     date_2     str
1   1 2010-07-04 2008-01-20       A
2   2 2015-07-01 2011-08-31       C
3   3 2015-03-06 2013-01-18       D
4   4 2013-01-10 2011-08-30       D
5   5 2014-06-04 2011-09-18       B
6   5 2014-06-04 2011-09-18    B, B
7   6 2012-11-22 2011-09-28       C
8   7 2014-06-17 2013-08-04       A
10  7 2014-06-17 2013-08-04    A, B
11  7 2014-06-17 2013-08-04 A, B, B

Note: function(...) paste(..., sep = ", ") could also be function(x, y) paste(x, y, sep = ", "). (Thanks Pierre Lafortune)

like image 95
Rich Scriven Avatar answered Sep 27 '22 22:09

Rich Scriven


Here's a possible solution combining data.table with an inner tapply that seem to get you what you need (you can use paste instead of toString if you like, it just looks cleaner to me that way).

library(data.table)
setDT(df)[, Str := tapply(str[sequence(1:.N)], rep(1:.N, 1:.N), toString), by = ID]
df
#     ID     date_1     date_2 str     Str
#  1:  1 2010-07-04 2008-01-20   A       A
#  2:  2 2015-07-01 2011-08-31   C       C
#  3:  3 2015-03-06 2013-01-18   D       D
#  4:  4 2013-01-10 2011-08-30   D       D
#  5:  5 2014-06-04 2011-09-18   B       B
#  6:  5 2014-06-04 2011-09-18   B    B, B
#  7:  6 2012-11-22 2011-09-28   C       C
#  8:  7 2014-06-17 2013-08-04   A       A
#  9:  7 2014-06-17 2013-08-04   B    A, B
# 10:  7 2014-06-17 2013-08-04   B A, B, B

You may be able to improve it a bit using

setDT(df)[, Str := {Len <- 1:.N ; tapply(str[sequence(Len)], rep(Len, Len), toString)}, by = ID]
like image 34
David Arenburg Avatar answered Sep 27 '22 21:09

David Arenburg