A very basic programming question here, but for future's sake I just want know which way would be the very best way to handle this common situation.
I have three columns of varying levels between 0 and 10, and I wish to determine which of them has the highest value of all, and display the column's name (in the mutated column or otherwise created 'Largest' column.) In the event of any ties I prefer favoring the c over b over a column, since this switch will be used to pull values from other columns which might not be equivalent like these are.
The code below does the trick, but is there a shorter and simpler way?
set.seed(7)
mat <- matrix(as.integer(runif(15, 0, 10)), nrow = 5, ncol = 3)
colnames(mat) <- letters[1:3]
(mat)
matBestOf <-
data.frame(mat) %>%
mutate(Largest = ifelse(c >= b & c >= a, "c",
ifelse(b >= c & b >= a, "b",
"a"))
)
matBestOf
# a b c Largest
# 1 9 7 1 a
# 2 3 3 2 b
# 3 1 9 7 b
# 4 0 1 0 b
# 5 2 4 4 c
I tried using the max() function, but I am only getting it to return the highest value instead of the column name with the highest value. Additionally, I am apparently not comparing values out of all three columns as the results are only coming out of the best of a and c, and never b. Additionally, it seems I can not favor the higher letter, which is okay and maybe I can live without that added feature.
matBestOf <-
data.frame(mat) %>%
rowwise %>%
mutate(Largest = max(a:c))
matBestOf
# Source: local data frame [5 x 4]
# Groups: <by row>
#
# a b c Largest
# (int) (int) (int) (int)
# 1 9 7 1 9
# 2 3 3 2 3
# 3 1 9 7 7
# 4 0 1 0 0
# 5 2 4 4 4
Here's an option with max.col:
mat %>%
data.frame() %>%
mutate(Largest = names(.)[max.col(., ties.method = "last")])
# c b a Largest
#1 1 7 9 a
#2 2 3 3 b
#3 7 9 1 b
#4 0 1 0 b
#5 4 4 2 c
I use select to put the columns in the order you specified so that we can simply use ties.method = "first". The everything() ensures that other columns (if present), will also be selected, but appear after the first three columns.
Using apply, and rev to give priority to c over b over a:
cbind.data.frame(mat,
Largest = apply(mat, 1,
function(i)rev(colnames(mat))[rev(i) == max(i)][1]))
# a b c Largest
# 1 9 7 1 a
# 2 3 3 2 b
# 3 1 9 7 b
# 4 0 1 0 b
# 5 2 4 4 c
Edit: Benchmarking
Taking rev outside apply makes the code 3-4 times faster on a bigger data, still not as fast as dplyr solution.
library(dplyr)
# bigger dummy data
bigmat <- matrix(rep(mat, 10000), ncol = 20)
colnames(bigmat) <- letters[1:ncol(bigmat)]
microbenchmark::microbenchmark(
dplyr = {bigmat %>%
data.frame() %>%
select(c,b,a, everything()) %>%
mutate(Largest = names(.)[max.col(., ties.method = "first")])},
base_apply_v1 = {
cbind.data.frame(bigmat,
Largest = apply(bigmat, 1,
function(i)rev(colnames(bigmat))[rev(i) == max(i)][1]))
},
base_apply_v2 = {
myFlip <- bigmat[nrow(bigmat):1, ncol(bigmat):1]
myNames <- colnames(myFlip)
cbind.data.frame(bigmat,
Largest = apply(myFlip, 1,
function(i)myNames[i == max(i)][1]))
}
)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# dplyr 3.271673 3.52583 4.665696 3.730951 5.915583 8.405259 100 a
# base_apply_v1 86.191320 91.94412 99.370839 93.709812 96.214598 196.007909 100 c
# base_apply_v2 23.121803 26.70536 30.906054 28.042854 29.065466 134.257780 100 b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With