Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select last row by group for all columns data.table

Tags:

r

data.table

I was surprised doing the following:

R) system.time(lastOrder <- order[,lapply(.SD,tail,1),by="TRADER_ID,EXEC_IDATE"]);
utilisateur     système      écoulé 
       1.45        0.00        1.53 
R) nrow(order)
[1] 75301
R) ncol(order)
[1] 23

Thought it was very long, then I did

R) system.time(lastOrder <- order[,list(test=tail(EXEC_IDATE,1)),by="TRADER_ID,EXEC_IDATE"]);
utilisateur     système      écoulé 
       0.14        0.00        0.14 

as far as I understand, if you know all the rows to select and work on most of the work is done, then I don't see why apply this to all columns should be 10x longer. Am I doing something wrong on the first bit of code, this is the only way I know to select last rows by group

like image 530
statquant Avatar asked Jan 03 '13 16:01

statquant


1 Answers

Last row by group :

DT[, .SD[.N], by="TRADER_ID,EXEC_IDATE"]            # (1)

or, faster (avoid use of .SD where possible, for speed) :

w = DT[, .I[.N], by="TRADER_ID,EXEC_IDATE"][[3]]    # (2)
DT[w]

Note that the following feature request will make approach (1) as fast as approach (2) :

FR#2330 Optimize .SD[i] query to keep the elegance but make it faster unchanged.

like image 63
Matt Dowle Avatar answered Nov 05 '22 13:11

Matt Dowle