I recently noticed in some old code that I had been including extra square brackets when subsetting a data.table and performing a function repeatedly (in my case, calculating correlation matrices). So,
# Slow way
rcorr(DT[subgroup][, !'Group', with=F])
# Faster way
rcorr(DT[subgroup, !'Group', with=F])
(The difference being after subgroup). Just out of curiosity, why does this occur? With the extra brackets, does data.table have to perform some extra computations?
Here's a simple interpretation:
# Slow way
rcorr(DT[subgroup][, !'Group'])
The second set of brackets is a second operation on DT, meaning that DT[subgroup] creates a new data table from DT, and then [, !'Group'] operates on that data table, creating another new data table. Hence the decline in speed.
# Faster way
rcorr(DT[subgroup, !'Group'])
This way operates only on DT, all in one go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With