Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simultaneous order, row-filter and column-select with data.table

Tags:

r

data.table

I am trying to do multiple steps in one line in R to select a value from a data.table (dt) with multiple criteria.

For example:

set.seed(123)
dt <- data.table(id = rep(letters[1:2],2),
             time = rnorm(4),
             value = rnorm(4)*100)

#    id        time      value
# 1:  a -0.56047565   12.92877
# 2:  b -0.23017749  171.50650
# 3:  a  1.55870831   46.09162
# 4:  b  0.07050839 -126.50612

# Now I want to select the last (maximum time) value from id == "a"
# My pseudo data.table code looks like this

dt[order(time) & id == "a" & .N, value]
# [1] 12.92877 46.09162  

Instead of getting the two values I want to get only the last value (which has the higher time-value).

If I do it step-by-step it works:

dt <- dt[order(time) & id == "a"]
dt[.N, value]
# [1] 46.09162

Bonus:

How can I order a data.table without copying the data.table: ie

dt <- dt[order(time)]

without the <-. Similar to the :=-operator such as in dt[, new_val := value*2] which creates the new variable without copying the whole data.table.

Thank you, any idea is greatly appreciated!

like image 905
David Avatar asked Jun 02 '15 20:06

David


1 Answers

For you first question, try

dt[id == "a", value[which.max(time)]]
## [1] 46.09162

For bonus question, try the setorder function which will order your data in place (you can also order in descending order by adding - in front of time)

setorder(dt, time)
dt
#    id        time      value
# 1:  a -0.56047565   12.92877
# 2:  b -0.23017749  171.50650
# 3:  b  0.07050839 -126.50612
# 4:  a  1.55870831   46.09162

Also, if you already ordering your data by time, you could do both - order by reference and select value by condition- in single line

setorder(dt, time)[id == "a", value[.N]]
like image 74
David Arenburg Avatar answered Oct 11 '22 22:10

David Arenburg