Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keyed lookup on data.table without 'with'

I have a data.table structure like so (except mine is really huge):

dt <- data.table(x=1:5, y=3:7, key='x')

I want to look up rows in that structure by another variable whose name is x (notice - the same as the name of the key of dt):

x <- 3:4
dt2 <- dt[ J(x) ]

This doesn't work, because the lookup sees the column name first, and the local variable is obscured:

dt2
#    x y
# 1: 1 3
# 2: 2 4
# 3: 3 5
# 4: 4 6
# 5: 5 7

I thought about the with argument for [.data.table, but that only applies to the j argument, not the i argument.

Is there something similar for the i argument?

If not, such a thing would be handy whenever I'm using a local variable and I don't know the complete list of column names in dt, to avoid conflicts.

like image 396
Ken Williams Avatar asked Feb 27 '13 00:02

Ken Williams


3 Answers

There is an item in the NEWS for 1.8.2 that suggests a ..() syntax will be added at some point, allowing this

New DT[.(...)] syntax (in the style of package plyr) is identical to DT[list(...)], DT[J(...)] and DT[data.table(...)]. We plan to add ..(), too, so that .() and ..() are analogous to the file system's ./ and ../; i.e., .() evaluates within the frame of DT and ..() in the parent scope.

In the mean time, you can get from the appropriate environment

dt[J(get('x', envir = parent.frame(3)))]
##    x y
## 1: 3 5
## 2: 4 6

or you could eval the whole call to list(x) or J(x)

dt[eval(list(x))]
dt[eval(J(x))]
dt[eval(.(x))]
like image 55
mnel Avatar answered Sep 24 '22 03:09

mnel


New answer, now that I think I understand what was requested:

> X <- data.table(x=x)
> merge(dt, X)
   x y
1: 3 6
2: 4 7
like image 21
IRTFM Avatar answered Sep 23 '22 03:09

IRTFM


Setting a key is not required and it's faster:

dt[eval(dt[, x %in% ..x])]

   x y
1: 3 5
2: 4 6

Benchmark with the previously posted answers

microbenchmark(dt[eval(dt[, x %in% ..x])],
               dt[J(get('x', parent.frame(3)))],
               dt[eval(list(x))],
               dt[eval(J(x))],
               dt[eval(.(x))],
               merge(dt, data.table(x)),
               times = 100L)

Unit: microseconds
                                  expr    min      lq     mean  median      uq    max neval
      dt[eval(dt[, x %in% ..x])]  486.1  500.60  518.529  503.70  512.65 1238.0   100
dt[J(get("x", parent.frame(3)))]  837.3  853.25  891.424  860.00  868.30 1675.3   100
               dt[eval(list(x))]  831.8  842.70  929.521  851.95  859.85 3878.3   100
                  dt[eval(J(x))]  833.8  845.50  948.535  856.00  870.00 4599.2   100
                  dt[eval(.(x))]  828.6  846.40  871.054  851.75  859.35 1985.6   100
        merge(dt, data.table(x)) 1766.0 1804.70 1907.617 1819.95 1870.95 3123.1   100
like image 32
George Shimanovsky Avatar answered Sep 24 '22 03:09

George Shimanovsky