I am trying to do multiple steps in one line in R to select a value from a data.table (dt) with multiple criteria.
For example:
set.seed(123)
dt <- data.table(id = rep(letters[1:2],2),
time = rnorm(4),
value = rnorm(4)*100)
# id time value
# 1: a -0.56047565 12.92877
# 2: b -0.23017749 171.50650
# 3: a 1.55870831 46.09162
# 4: b 0.07050839 -126.50612
# Now I want to select the last (maximum time) value from id == "a"
# My pseudo data.table code looks like this
dt[order(time) & id == "a" & .N, value]
# [1] 12.92877 46.09162
Instead of getting the two values I want to get only the last value (which has the higher time-value).
If I do it step-by-step it works:
dt <- dt[order(time) & id == "a"]
dt[.N, value]
# [1] 46.09162
Bonus:
How can I order a data.table without copying the data.table: ie
dt <- dt[order(time)]
without the <-
. Similar to the :=
-operator such as in dt[, new_val := value*2]
which creates the new variable without copying the whole data.table.
Thank you, any idea is greatly appreciated!
For you first question, try
dt[id == "a", value[which.max(time)]]
## [1] 46.09162
For bonus question, try the setorder
function which will order your data in place (you can also order in descending order by adding -
in front of time
)
setorder(dt, time)
dt
# id time value
# 1: a -0.56047565 12.92877
# 2: b -0.23017749 171.50650
# 3: b 0.07050839 -126.50612
# 4: a 1.55870831 46.09162
Also, if you already ordering your data by time, you could do both - order by reference and select value
by condition- in single line
setorder(dt, time)[id == "a", value[.N]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With