Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table: subset by index

Tags:

r

data.table

In the following data table I'd like to select rows with unique id and the smallest value from the column pos2:

dt = data.table(id = c(1,2,2,3,3,3),
            pos1 = c(0.1, 0.2, 0.2, 0.3, 0.3, 0.3),
            pos2 = c(0.1, 0.25, 0.21, 0.34, 0.31, 0.32))

   id pos1 pos2
1:  1  0.1 0.10
2:  2  0.2 0.25
3:  2  0.2 0.21
4:  3  0.3 0.34
5:  3  0.3 0.31
6:  3  0.3 0.32

The way I'm doing it now is by creating an intermediate table:

dt.red = dt[, .(pos2 = first(sort(pos2))), by = id]

   id pos2
1:  1 0.10
2:  2 0.21
3:  3 0.31

Then I merge to obtain the desired end result:

merge(dt, dt.red)

   id pos2 pos1
1:  1 0.10  0.1
2:  2 0.21  0.2
3:  3 0.31  0.3

Is there a cleaner way of achieving that with data.table?

like image 244
mattek Avatar asked Oct 20 '25 06:10

mattek


2 Answers

It's also possible to do this without .I, but it'll be slower*

dt[order(pos2), head(.SD, 1), id]
#    id pos1 pos2
# 1:  1  0.1 0.10
# 2:  2  0.2 0.21
# 3:  3  0.3 0.31

*Or maybe not, see comments below

like image 50
IceCreamToucan Avatar answered Oct 22 '25 21:10

IceCreamToucan


We can use .I to get the row index and use that in i for subsetting the rows

dt[dt[order(pos2), .I[1], by = id]$V1]
#   id pos1 pos2
#1:  1  0.1 0.10
#2:  2  0.2 0.21
#3:  3  0.3 0.31

Or with setorder

setorder(dt, id, pos2)[, .SD[1L], id]
like image 36
akrun Avatar answered Oct 22 '25 19:10

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!