Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collapse data.table column values while grouping

given a data.table object I would to collapse the values of some grouped columns into a single object and insert the resulting objects into a new colum.

dt <- data.table(
            c('A|A', 'B|A', 'A|A', 'B|A', 'A|B'),
            c(0, 0, 1, 1, 0),
            c(22.7, 1.2, 0.3, 0.4, 0.0)
)
setnames(dt, names(dt), c('GROUPING', 'NAME', 'VALUE'))
dt
#    GROUPING NAME VALUE
# 1:      A|A    0  22.7
# 2:      B|A    0   1.2
# 3:      A|A    1   0.3
# 4:      B|A    1   0.4
# 5:      A|B    0   0.0

I think that to do this is first necessary to specify the column for which you want to group, so I should start with something like dt[, OBJECTS := <expr>, by = GROUPING].

Unfortunately, I don't know the expression <expr> to use so that the result is as follows:

#    GROUPING   OBJECTS
# 1:      A|A  <vector>
# 2:      B|A  <vector>
# 3:      A|B  <vector>

Each <vector> must contain the values ​​of the other columns. E.g the first <vector> have to be a named vector equivalent to:

eg <- c(22.7, 0.3)
names(eg) <- c('0', '1')
#    0    1 
# 22.7  0.3
like image 853
leodido Avatar asked Apr 17 '13 04:04

leodido


2 Answers

Working inside of j: If you want to have the values of a column be a vector, you need to wrap the output in list(.).

j itself requires a call to list, so your final expression will resemble a nested list, eg:

dt[, list(allNames=list(NAME), allValues=list(VALUE)), by=GROUPING]

#    GROUPING allNames allValues
# 1:      A|A      0,1  22.7,0.3
# 2:      B|A      0,1   1.2,0.4
# 3:      A|B        0         0

As @Mnel points out, equivalently:

dt[, lapply(.SD, list), by=GROUPING]

If you want it in long form, then the structure of your <expr> will be:
list( c( list(), list(), ..., list() ) ) eg:

dt[, list(c(list(NAME), list(VALUE))), by=GROUPING]

#    GROUPING       V1
# 1:      A|A      0,1
# 2:      A|A 22.7,0.3
# 3:      B|A      0,1
# 4:      B|A  1.2,0.4
# 5:      A|B        0
# 6:      A|B        0

Or equivalently:

dt[, list(lapply(.SD, c)), by=GROUPING]
like image 113
Ricardo Saporta Avatar answered Oct 22 '22 22:10

Ricardo Saporta


I think that this is what you are looking for:

dt1 <- dt[, list(list(setNames(VALUE, NAME))), by = GROUPING]
dt1
#    GROUPING       V1
# 1:      A|A 22.7,0.3
# 2:      B|A  1.2,0.4
# 3:      A|B        0
str(dt1)
# Classes ‘data.table’ and 'data.frame':  3 obs. of  2 variables:
# $ GROUPING: chr  "A|A" "B|A" "A|B"
# $ V1      :List of 3
#  ..$ : Named num  22.7 0.3
#  .. ..- attr(*, "names")= chr  "0" "1"
#  ..$ : Named num  1.2 0.4
#  .. ..- attr(*, "names")= chr  "0" "1"
#  ..$ : Named num 0
#  .. ..- attr(*, "names")= chr "0"
# - attr(*, ".internal.selfref")=<externalptr> 
dt1$V1
# [[1]]
#    0    1 
# 22.7  0.3 
# 
# [[2]]
#   0   1 
# 1.2 0.4 
# 
# [[3]]
# 0 
# 0 

As @Arun points out in the comments, the "data.table" alternative to setNames in this case is setattr(VALUE, 'names', NAME), making another solution:

dt1 <- dt[, list(list(setattr(VALUE, 'names', NAME))), by = GROUPING]
like image 23
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 22 '22 20:10

A5C1D2H2I1M1N2O1R2T1