Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select row from data.table with min value

Tags:

r

data.table

I have a data.table and I need to compute some new value on it and select row with min value.

tb <- data.table(g_id=c(1, 1, 1, 2, 2, 2, 3),
          item_no=c(24,25,26,27,28,29,30),
          time_no=c(100, 110, 120, 130, 140, 160, 160),
          key="g_id")

#    g_id item_no time_no
# 1:    1      24     100
# 2:    1      25     110
# 3:    1      26     120
# 4:    2      27     130
# 5:    2      28     140
# 6:    2      29     160
# 7:    3      30     160

ts  <- 118
gId <- 2

tb[.(gId), list(item_no, tdiff={z=abs(time_no - ts)})]

#    g_id item_no tdiff
# 1:    2      27    12
# 2:    2      28    22
# 3:    2      29    42

And now I need to get the row (actually only item_no of this row) with minimal tdiff

#    g_id item_no tdiff
# 1:    2      27    12

Can I make it in one operation with tb? What is the fastest way to do this (because I need to do this operation about 500,000 rows)?

like image 514
Katerina Avatar asked Mar 29 '14 08:03

Katerina


People also ask

How do I select a row with minimum value in SQL?

To find the minimum value of a column, use the MIN() aggregate function; it takes as its argument the name of the column for which you want to find the minimum value. If you have not specified any other columns in the SELECT clause, the minimum will be calculated for all records in the table.

How do you find the minimum value in a row?

Use min() function to find the minimum value over the index axis. 2) Get minimum values of every row : Use min() function on a dataframe with 'axis = 1' attribute to find the minimum value over the row axis.

How do I select a specific row in a table in SQL?

To select rows using selection symbols for character or graphic data, use the LIKE keyword in a WHERE clause, and the underscore and percent sign as selection symbols. You can create multiple row conditions, and use the AND, OR, or IN keywords to connect the conditions.


1 Answers

You can try .SD and [][] chain query.

The problem to my understanding is that first you update an new column, then find the minimal tdiff

library(data.table)
tb <- data.table(g_id=c(1, 1, 1, 2, 2, 2, 3),
             item_no=c(24,25,26,27,28,29,30),
             time_no=c(100, 110, 120, 130, 140, 160, 160),
             key="g_id")

ts <- 118

# My solution is quite simple
tb[, tdiff := list(tdiff=abs(time_no - ts))][, .SD[which.min(tdiff)], by = key(tb)]

I think .SD is more appropriate. Also you can update using :=

and this is the output:

   g_id item_no time_no tdiff
1:    1      26     120     2
2:    2      27     130    12
3:    3      30     160    42
like image 173
Bigchao Avatar answered Sep 29 '22 12:09

Bigchao