Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting multiple and different attributes for columns of a data.table

data.table has the elegant setattr for in-place addition of a single attribute to a column. Is there an elegant way to overlay multiple attributes in one step? For example, suppose that a data.table has many columns and I want to assign two attributes to column x1 and three attributes to column x3 as might be specified in the following list:

a <- list(x1=list(label='X1', units='mm'),
          x3=list(label='X3', comment='collected remotely', format='type 3'))

I could easily write code that processes a and calls setattr 5 times to accomplish this. But I'm hoping there is a better way.

like image 206
Frank Harrell Avatar asked Oct 15 '22 21:10

Frank Harrell


2 Answers

I don't know if the following code is very elegant but it works. It's a double *apply loop.
Quoting the question:

I could easily write code that processes a and calls setattr 5 times to accomplish this. But I'm hoping there is a better way.

The problem is that the name in setattr must be a length 1 character string, so setattr will always have to be called 5 times. In the code below this is done in disguise of a double loop.

The example data.table comes from the 3rd DT in help("setattr").

library(data.table)

DT <- data.table(x1 = 1:3, y = 4:6, x3 = 7:9)
a <- list(x1=list(label='X1', units='mm'),
          x3=list(label='X3', comment='collected remotely', format='type 3'))

mapply(function(x, a){
  lapply(names(a), function(na) setattr(DT[[x]], na, a[[na]]))
}, names(a), a)

attributes(DT$x1)
#$label
#[1] "X1"
#
#$units
#[1] "mm"

attributes(DT$x3)
#$label
#[1] "X3"
#
#$comment
#[1] "collected remotely"
#
#$format
#[1] "type 3"

Note. In order to avoid the ugly output from the loops, wrap them in invisible:

invisible(
  mapply(function(x, a){
    lapply(names(a), function(na) setattr(DT[[x]], na, a[[na]]))
  }, names(a), a)
)

Edit

The following code is simpler.

lapply(names(a), function(x){
  lapply(names(a[[x]]), function(y) setattr(DT[[x]], y, a[[x]][[y]]))
})
like image 119
Rui Barradas Avatar answered Oct 20 '22 08:10

Rui Barradas


This may deviate too much from you desired output, but just to throw out an idea: because setattr accepts a data.table, an alternative may be to set attributes at the data.table level, as a named list pointing to the individual columns:

setattr(d, "all_attr", a)
str(d)
# Classes ‘data.table’ and 'data.frame':    3 obs. of  3 variables:
# $ x1: int  1 2 3
# $ y : int  4 5 6
# $ x3: int  7 8 9
# - attr(*, ".internal.selfref")=<externalptr> 
#   - attr(*, "all_attr")=List of 2
# ..$ x1:List of 2
# .. ..$ label: chr "X1"
# .. ..$ units: chr "mm"
# ..$ x3:List of 3
# .. ..$ label  : chr "X3"
# .. ..$ comment: chr "collected remotely"
# .. ..$ format : chr "type 3"

If you want the attributes set at the level of individual columns, and if you can live with attributes as a nested list, I think it may be enough to loop over the columns.

lapply(names(a), function(x) setattr(d[[x]], x, a[[x]]))
str(d)
# Classes ‘data.table’ and 'data.frame':    3 obs. of  3 variables:
# $ x1: int  1 2 3
# ..- attr(*, "x1")=List of 2
# .. ..$ label: chr "X1"
# .. ..$ units: chr "mm"
# $ y : int  4 5 6
# $ x3: int  7 8 9
# ..- attr(*, "x3")=List of 3
# .. ..$ label  : chr "X3"
# .. ..$ comment: chr "collected remotely"
# .. ..$ format : chr "type 3"
# - attr(*, ".internal.selfref")=<externalptr>

library(data.table)
d = data.table(x1 = 1:3, y = 4:6, x3 = 7:9)
like image 2
Henrik Avatar answered Oct 20 '22 07:10

Henrik