The Fastest/Simplest Algorithm/Function to Determine Largest of Three Values

Question

A very basic programming question here, but for future's sake I just want know which way would be the very best way to handle this common situation.

I have three columns of varying levels between 0 and 10, and I wish to determine which of them has the highest value of all, and display the column's name (in the mutated column or otherwise created 'Largest' column.) In the event of any ties I prefer favoring the c over b over a column, since this switch will be used to pull values from other columns which might not be equivalent like these are.

The code below does the trick, but is there a shorter and simpler way?

set.seed(7)
mat <- matrix(as.integer(runif(15, 0, 10)), nrow = 5, ncol = 3)
colnames(mat) <- letters[1:3]
(mat)

matBestOf <- 
    data.frame(mat) %>% 
    mutate(Largest = ifelse(c >= b & c >= a, "c",
                     ifelse(b >= c & b >= a, "b",
                     "a"))
           )
matBestOf
#   a b c Largest
# 1 9 7 1       a
# 2 3 3 2       b
# 3 1 9 7       b
# 4 0 1 0       b
# 5 2 4 4       c

I tried using the max() function, but I am only getting it to return the highest value instead of the column name with the highest value. Additionally, I am apparently not comparing values out of all three columns as the results are only coming out of the best of a and c, and never b. Additionally, it seems I can not favor the higher letter, which is okay and maybe I can live without that added feature.

matBestOf <- 
    data.frame(mat) %>% 
    rowwise %>% 
    mutate(Largest = max(a:c))
matBestOf
# Source: local data frame [5 x 4]
# Groups: <by row>
#
#       a     b     c Largest
#   (int) (int) (int)   (int)
# 1     9     7     1       9
# 2     3     3     2       3
# 3     1     9     7       7
# 4     0     1     0       0
# 5     2     4     4       4

talat · Accepted Answer

Here's an option with max.col:

mat %>% 
  data.frame() %>%
  mutate(Largest = names(.)[max.col(., ties.method = "last")])

#  c b a Largest
#1 1 7 9       a
#2 2 3 3       b
#3 7 9 1       b
#4 0 1 0       b
#5 4 4 2       c

I use select to put the columns in the order you specified so that we can simply use ties.method = "first". The everything() ensures that other columns (if present), will also be selected, but appear after the first three columns.

zx8754 · Answer

Using apply, and rev to give priority to c over b over a:

cbind.data.frame(mat,
      Largest = apply(mat, 1,
                      function(i)rev(colnames(mat))[rev(i) == max(i)][1]))
#   a b c Largest
# 1 9 7 1       a
# 2 3 3 2       b
# 3 1 9 7       b
# 4 0 1 0       b
# 5 2 4 4       c

Edit: Benchmarking

Taking rev outside apply makes the code 3-4 times faster on a bigger data, still not as fast as dplyr solution.

library(dplyr)

# bigger dummy data
bigmat <- matrix(rep(mat, 10000), ncol = 20)
colnames(bigmat) <- letters[1:ncol(bigmat)]


microbenchmark::microbenchmark(
  dplyr = {bigmat %>% 
      data.frame() %>% 
      select(c,b,a, everything()) %>%
      mutate(Largest = names(.)[max.col(., ties.method = "first")])},
  base_apply_v1 = {
    cbind.data.frame(bigmat,
                     Largest = apply(bigmat, 1,
                                     function(i)rev(colnames(bigmat))[rev(i) == max(i)][1]))
  },
  base_apply_v2 = {
    myFlip <- bigmat[nrow(bigmat):1, ncol(bigmat):1]
    myNames <- colnames(myFlip)
    cbind.data.frame(bigmat,
                     Largest = apply(myFlip, 1,
                                     function(i)myNames[i == max(i)][1]))
  }
  )

# Unit: milliseconds
#           expr       min       lq      mean    median        uq        max neval cld
#          dplyr  3.271673  3.52583  4.665696  3.730951  5.915583   8.405259   100 a  
#  base_apply_v1 86.191320 91.94412 99.370839 93.709812 96.214598 196.007909   100   c
#  base_apply_v2 23.121803 26.70536 30.906054 28.042854 29.065466 134.257780   100  b

The Fastest/Simplest Algorithm/Function to Determine Largest of Three Values

Tags:

algorithm

r

conditional-statements

dplyr

leerssej

2 Answers

talat

zx8754

Recent Activity

Donate For Us

The Fastest/Simplest Algorithm/Function to Determine Largest of Three Values

Tags:

algorithm

r

conditional-statements

dplyr

leerssej

2 Answers

talat

zx8754

Related questions

Recent Activity

Donate For Us