I was surprised doing the following:
R) system.time(lastOrder <- order[,lapply(.SD,tail,1),by="TRADER_ID,EXEC_IDATE"]);
utilisateur système écoulé
1.45 0.00 1.53
R) nrow(order)
[1] 75301
R) ncol(order)
[1] 23
Thought it was very long, then I did
R) system.time(lastOrder <- order[,list(test=tail(EXEC_IDATE,1)),by="TRADER_ID,EXEC_IDATE"]);
utilisateur système écoulé
0.14 0.00 0.14
as far as I understand, if you know all the rows to select and work on most of the work is done, then I don't see why apply this to all columns should be 10x longer. Am I doing something wrong on the first bit of code, this is the only way I know to select last rows by group
Last row by group :
DT[, .SD[.N], by="TRADER_ID,EXEC_IDATE"] # (1)
or, faster (avoid use of .SD
where possible, for speed) :
w = DT[, .I[.N], by="TRADER_ID,EXEC_IDATE"][[3]] # (2)
DT[w]
Note that the following feature request will make approach (1) as fast as approach (2) :
FR#2330 Optimize .SD[i] query to keep the elegance but make it faster unchanged.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With