Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I update a single row in each group?

Tags:

r

data.table

I've got data big enough to need data.table for the first time ever, and I've been very pleased with how easy it's been so far. I've read quite a bit of documentation today (certainly not nearly all of it), but I haven't found this yet.

I've got a data table keyed by placeid and t2, with one other column, t1. What I'd like to do is set t1 to 0 in each row where t2 is at its minimum, by placeid.

## Sample data
set.seed(47)
require(data.table)
dt <- data.table(placeid = rep(letters[1:3], each = 3), t1 = runif(9), t2 = runif(9))
setkeyv(dt, cols=c("placeid", "t2"))

As t2 is in the key, the row I want to change is the first one within each grouping. I was able to get it to work with an ifelse statement, but is there a better way to do it using the i argument of [.data.table?

I was hoping one of these would work, though on thinking tiny bit more it makes sense that they don't:

dt[1, t1 := 0, by = placeid] ## changes only first row
dt[which.min(t2), t1 := 0, by = placeid] ## changes only global min row

What I did find to work (the result being the desired output):

dt[, t1 := ifelse(t2 == min(t2), 0, t1), by = placeid]  # works
like image 448
Gregor Thomas Avatar asked Mar 22 '23 05:03

Gregor Thomas


1 Answers

Because you know that dt is keyed (ie sorted) by placeid and t2, then the first row by placeid is the row you wish to update.

You can use mult = 'first' to match the first row only when joining on the unique values of placeid

dt[unique(placeid),t1 := 0,mult='first']
dt
#    placeid        t1         t2
# 1:       a 0.0000000 0.13879758
# 2:       a 0.7615020 0.70198720
# 3:       a 0.9769620 0.92489205
# 4:       b 0.0000000 0.16219364
# 5:       b 0.6914124 0.50603611
# 6:       b 0.5735444 0.59930702
# 7:       c 0.0000000 0.03094497
# 8:       c 0.4689460 0.40050280
# 9:       c 0.3890619 0.90197352

If there is the possibility that you wish to update more than one row in the case where there are multiple rows where t2 == min(t2), you can use .I[t2==min(t2)] to identify these rows by placeid

dt[dt[,.I[t2 == min(t2)],by='placeid']$V1, t1 := 0]

Note that this will be slower as than the binary search option above.

like image 81
mnel Avatar answered Apr 01 '23 08:04

mnel