Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

S3 method dispatch in data.table when using `by` clause

Tags:

r

data.table

Update: it appears this is an issue with data.table version 1.9.4 and not the most recent version of the package (1.9.6 as of this writing).

I have a table which I read in via fread like so:

library(data.table)
library(bit64)
dt = fread('"x","y"\n2489751247,"a"\n2492940518,"b"\n2444706811,"a"\n2408767228,"b"')

:>              x y
:>  1: 2489751247 a
:>  2: 2492940518 b
:>  3: 2444706811 a
:>  4: 2408767228 b

and I want the sum of x conditional on y, but data.table gives the wrong answer:

dt[,.(total=sum(x)),by=y]

:>     y         total
:>  1: a 2.437946e-314
:>  2: b 2.421765e-314

without the courtesy of a warning message. It turns out that x is of class integer64:

lapply(dt,class)

:>  $x
:>  [1] "integer64"
:>  $y
:>  [1] "character"

so I can do the s3 dispatch manually like so:

dt[,.(total=sum.integer64(x)),by=y]

:>     y      total
:>  1: a 4934458058
:>  2: b 4901707746

and for some reason using the class of x in the j clause causes data.table to give the correct answer:

dt[,.(total=sum(x),cls=class(x)),by=y]

:>     y      total       cls
:>  1: a 4934458058 integer64
:>  2: b 4901707746 integer64

which is weird. Is there some way to tell data.table to use S3 methods without using the class explicitly?

like image 978
Jthorpe Avatar asked Sep 26 '22 11:09

Jthorpe


1 Answers

This particular issue was specific to data.table version 1.9.4 and not the most recent version of the data.table package (1.9.6 as of this writing). You can inspect your version of data.table via:

installed.packages()['data.table','Version']

and if it's less than 1.9.6 then you need to call install.packages('data.table'). Note that if you are using a version of R provided by Revolution Analytics, you'll need to set the repos argment to your favorite CRAN mirror explicitly as their most up to date repo (as of this writing) has data.table version 1.9.4:

install.packages('data.table',repos="http://my.favorite.CRAN.mirror/")

Although I very rarely use the .Rprofile.site, I put these lines in the that file in on the machines i work with:

if( packageVersion("data.table") == package_version('1.9.4'))
    install.packages("data.table",lib=Sys.getenv("R_LIBS_USER"),repos='http://my.favorite.CRAN.mirror')
like image 167
Jthorpe Avatar answered Sep 30 '22 07:09

Jthorpe