Imagine a dataframe:
set.seed(1234)
data<-data.frame(id = sample(letters, 26, replace = FALSE),
a = sample(1:10,26,replace=T),
b = sample(1:10,26,replace=T),
c = sample(1:10,26,replace=T))
I'd like to retain, for each id
, the column name in which the largest value lies.
The result I am looking for is a data frame with dimensions of 26 x 2 with a column for id
and column for largest_value_var
. The largest_value_var
would contain either a
,b
, or c
.
So far, I have been able to extract the variable name with which the max value is associated using this:
apply(data[,-1], 1, function(x) c(names(x))[which.max(x)])
But I can't seem to quite get the result I'd like into a dataframe... Any help is appreciated.
You can do this fairly easily with max.col()
. Setting ties.method = "first"
(thanks akrun), we will get the first column in the case of a tie. Here's a data table method:
library(data.table)
setDT(data)[, names(.SD)[max.col(.SD, "first")], by = id]
Update: It seems this method would be more efficient when implemented in base R, probably because of the as.matrix()
conversion in max.col()
. So here's one way to accomplish it in base.
cbind(data[1], largest = names(data)[-1][max.col(data[-1], "first")])
Thanks to Ananda Mahto for pointing out the efficiency difference.
I like @Richard's use of max.col
, but the first thing that came to my mind was to actually get the data into a "tidy" form first, after which doing the subsetting you want should be easy:
library(reshape2)
library(data.table)
melt(as.data.table(data), id.vars = "id")[, variable[which.max(value)], by = id]
# id V1
# 1: c b
# 2: p a
# 3: o c
# 4: x b
# 5: s a
## SNIP ###
# 21: g a
# 22: f b
# 23: t a
# 24: y a
# 25: w b
# 26: v a
# id V1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With