Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.SD columns in data.table in R

Tags:

r

data.table

When writing some expression that operates in j on a data.table, .SD doesn't contain all the columns in the table but rather only the ones that the expression uses. This is fine for running things but isn't awesome for debugging. What's the best way to see all the columns? I can pass all the names to .SDcols but this seems fairly tedious. Ex:

x = data.table(a=1:10, b=10:1, id=1:5)
x[,{ browser(); a+1},by=id]
Called from: `[.data.table`(x, , {
    browser()
    a + 1
}, by = id)
Browse[1]> n
debug at #1: a + 1
Browse[1]> .SD
   a
1: 1
2: 6
like image 340
Alex Avatar asked May 15 '13 18:05

Alex


1 Answers

To make all of .SD's columns available, you just need to reference it somewhere in your j expression. For example, try this:

x[,{.SD; browser(); a+1},by=id]
# Called from: `[.data.table`(x, , {
#     .SD
#     browser()
#     a + 1
# }, by = id)
Browse[1]> .SD
#    a  b
# 1: 1 10
# 2: 6  5

This works because, as explained here

[.data.table() [...] previews the unevaluated j expression, and only adds to .SD columns that are referenced therein. If .SD itself is mentioned, it adds all of DT's columns.


Alternatively, if you don't want to incur the expense of loading .SD's columns for each by-group calculation, you can always inspect the currently loaded subset of x by calling x[.I,]. (.I is a variable which stores the row locations in x of the current group):

x[,{browser(); a+1},by=id]
# Called from: `[.data.table`(x, , {
#     browser()
#     a + 1
# }, by = id)
Browse[1]> x[.I,]
#    a  b id
# 1: 1 10  1
# 2: 6  5  1
like image 66
Josh O'Brien Avatar answered Nov 15 '22 01:11

Josh O'Brien