Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed decrease in subsetting `data.table` when adding a bracket

Tags:

r

data.table

I recently noticed in some old code that I had been including extra square brackets when subsetting a data.table and performing a function repeatedly (in my case, calculating correlation matrices). So,

# Slow way
rcorr(DT[subgroup][, !'Group', with=F])

# Faster way
rcorr(DT[subgroup, !'Group', with=F])

(The difference being after subgroup). Just out of curiosity, why does this occur? With the extra brackets, does data.table have to perform some extra computations?

like image 921
Chris Watson Avatar asked Jul 23 '15 23:07

Chris Watson


1 Answers

Here's a simple interpretation:

# Slow way
rcorr(DT[subgroup][, !'Group'])

The second set of brackets is a second operation on DT, meaning that DT[subgroup] creates a new data table from DT, and then [, !'Group'] operates on that data table, creating another new data table. Hence the decline in speed.

# Faster way
rcorr(DT[subgroup, !'Group'])

This way operates only on DT, all in one go.

like image 195
Rich Scriven Avatar answered Sep 28 '22 21:09

Rich Scriven