I recently noticed in some old code that I had been including extra square brackets when subsetting a data.table
and performing a function repeatedly (in my case, calculating correlation matrices). So,
# Slow way
rcorr(DT[subgroup][, !'Group', with=F])
# Faster way
rcorr(DT[subgroup, !'Group', with=F])
(The difference being after subgroup
). Just out of curiosity, why does this occur? With the extra brackets, does data.table
have to perform some extra computations?
Here's a simple interpretation:
# Slow way
rcorr(DT[subgroup][, !'Group'])
The second set of brackets is a second operation on DT
, meaning that DT[subgroup]
creates a new data table from DT
, and then [, !'Group']
operates on that data table, creating another new data table. Hence the decline in speed.
# Faster way
rcorr(DT[subgroup, !'Group'])
This way operates only on DT
, all in one go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With