Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to retrieve column for row-wise maximum value in an R data.table?

I have the following R data.table:

library(data.table)
iris = as.data.table(iris)
> iris
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa
6            5.4         3.9          1.7         0.4     setosa
7            4.6         3.4          1.4         0.3     setosa
8            5.0         3.4          1.5         0.2     setosa
...

Let's say I wanted to find the row-wise maximum value by each row, only for the subset of data.table columns: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width

I would use the following code:

iris[, maximum_element :=max(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width), by=1:nrow(iris)]

Which outputs

     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species     maximum_element
  1:          5.1         3.5          1.4         0.2    setosa               5.1
  2:          4.9         3.0          1.4         0.2    setosa               4.9
  3:          4.7         3.2          1.3         0.2    setosa               4.7
  4:          4.6         3.1          1.5         0.2    setosa               4.6
  5:          5.0         3.6          1.4         0.2    setosa               5.0

For my problem, I'm actually not interested in the value, but which column the value came from, i.e. I would like the following output:

     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species maximum_column
      1:          5.1         3.5          1.4         0.2    setosa  Sepal.Length
      2:          4.9         3.0          1.4         0.2    setosa  Sepal.Length
      3:          4.7         3.2          1.3         0.2    setosa  Sepal.Length
      4:          4.6         3.1          1.5         0.2    setosa  Sepal.Length
      5:          5.0         3.6          1.4         0.2    setosa  Sepal.Length

(In this case, the max. value each comes from Sepal.Length).

How do I "retrieve" the column name with the maximum value?

like image 666
ShanZhengYang Avatar asked Jul 17 '17 16:07

ShanZhengYang


1 Answers

Here is an option with pmax

iris[, maximum_element := do.call(pmax, .SD), .SDcols = 1:4]

and to find the column names, use max.col on .SD after specifying the .SDcols as the numeric columns, i.e. columns 1 to 4

iris[,maximum_column :=  names(.SD)[max.col(.SD)], .SDcols = 1:4]
head(iris, 4)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species maximum_column
#1:          5.1         3.5          1.4         0.2  setosa   Sepal.Length
#2:          4.9         3.0          1.4         0.2  setosa   Sepal.Length
#3:          4.7         3.2          1.3         0.2  setosa   Sepal.Length
#4:          4.6         3.1          1.5         0.2  setosa   Sepal.Length
like image 115
akrun Avatar answered Sep 18 '22 11:09

akrun