Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pass data.table column by name in function

Tags:

r

data.table

I want to pass a column name to a function and use column indexing and the setorder function:

require(data.table)
data(iris)

top3 = function(t, n) {
  setorder(t, n, order=-1)
  return ( t[1:3, .(Species, n)])
}

DT = data.table(iris)
top3(DT, Petal.Width)

However, this returns an error:

Error in setorderv(x, cols, order, na.last) : some columns are not in the data.table: n,1

I think I'm misunderstanding how passing bare column names works in R. What are my options?

like image 749
tkerwin Avatar asked Apr 26 '16 17:04

tkerwin


2 Answers

You can do

top3 = function(DT, nm) eval(substitute( DT[order(-nm), .(Species, nm)][, head(.SD, 3L)] ))
top3(DT, Petal.Width)

     Species Petal.Width
1: virginica         2.5
2: virginica         2.5
3: virginica         2.5

I would advise against (1) setorder inside a function, since it has side effects; (2) indexing with 1:3 when you may use this on a data.table with fewer than three rows in the future, to strange effect; (3) fixing 3 instead of making it an argument to the function; and (4) using n for name... just my personal preference to reserve n for counts.

like image 51
Frank Avatar answered Oct 29 '22 14:10

Frank


Assuming your dataset will always have more than 3 rows and that this is the ONLY operation you want to perform on that data table, it may be in your interest to use setorderv instead.

top3 = function(t, n) {
  setorderv(t, n, -1)
  return ( t[1:3, c("Species", n), with=FALSE])
}

DT = data.table(iris)
top3(DT, "Petal.Width")

Result:

     Species Petal.Width
1: virginica         2.5
2: virginica         2.5
3: virginica         2.5
like image 34
Serban Tanasa Avatar answered Oct 29 '22 13:10

Serban Tanasa