I have a data that I want to find out which column has the maximum value and output that column name. One issue is that if there is no maximum value (e.g, all numeric numbers equal) return all_equal
comment or if two column has equal max value compared to the third one output those two column name.
Here is the sample data
test <- data.frame(A=c(5,NA,NA,1,NA,NA,3,NA,NA),B=c(NA,2,NA,NA,1,NA,NA,1,NA),C=c(NA,NA,1,NA,NA,1,NA,NA,3),gr=gl(3,3))
A B C gr
1 5 NA NA 1
2 NA 2 NA 1
3 NA NA 1 1
4 1 NA NA 2
5 NA 1 NA 2
6 NA NA 1 2
7 3 NA NA 3
8 NA 1 NA 3
9 NA NA 3 3
In each gr
there is values in column A
, B and C
. My purpose is to find which column has the maximum value in that group and output that column name to new column called col_name
.
if all values are equal to each other as in gr=2
output is all_equal
if two of the column has max value comparing to the third column like in gr=3
output column names A&C
to the col_name
.
I realized that it might be difficult to build a pipeline without gather
so I tried;
library(dplyr)
test%>%
group_by(gr)%>%
gather(variable, value, -gr) %>%
arrange(gr)%>%
mutate(col_name=variable[which.max(value)])
# A tibble: 18 x 4
# Groups: gr [2]
r variable value col_name
<fct> <chr> <dbl> <chr>
1 1 A 5 A
2 1 A NA A
3 1 A NA A
4 1 B NA A
5 1 B 2 A
6 1 B NA A
7 1 C NA A
8 1 C NA A
9 1 C 1 A
10 2 A 1 A
11 2 A NA A
12 2 A NA A
13 2 B NA A
14 2 B 1 A
15 2 B NA A
16 2 C NA A
17 2 C NA A
18 2 C 1 A
The problem I am struggling in here is how to output all_equal
comment if all the max values are equal in columns A,B and C and
if 2 columns max value equals (A and C in gr=3) outputting those equal column names in this format A&C
in col_name
The expected output would be
> test
A B C gr col_name
1 5 NA NA 1 A
2 NA 2 NA 1 A
3 NA NA 1 1 A
4 1 NA NA 2 all_equal
5 NA 1 NA 2 all_equal
6 NA NA 1 2 all_equal
7 3 NA NA 3 A&C
8 NA 1 NA 3 A&C
9 NA NA 3 3 A&C
thx in advance!
The MAX() function returns the largest value of the selected column.
To find the max value of a column, use the MAX() aggregate function; it takes as its argument the name of the column for which you want to find the maximum value. If you have not specified any other columns in the SELECT clause, the maximum will be calculated for all records in the table.
Pandas DataFrame max() Method The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.
To get the maximum value from three different columns, use the GREATEST() function. Insert some records in the table using insert command. Display all records from the table using select statement.
Here is a dplyr
approach that I tried to make a little more generalized to accommodate a different number of columns of interest. With your test
data frame from above, start by defining a function that finds the max of the current group, gets indices for columns with matching values, then builds the output based on the number of matching columns:
foo <- function(df_, cols = 1:3) {
# Get max
m = max(df_[, cols], na.rm = TRUE)
# Get columns
ix <- as.data.frame(which(df_[, cols] == m, arr.ind = TRUE))[, 2]
matchlen = length(ix)
columns <- names(df_[,cols])[ix]
# Get varname based on length
out = ifelse(matchlen == length(cols), "all_equal", paste(columns, collapse = "&"))
df_$col_name = out
return(df_)
}
Because the output from that is a data frame, you need to make use of do
to apply it to groups with dplyr
:
test %>%
group_by(gr) %>%
do(foo(.))
# A tibble: 9 x 5
# Groups: gr [3]
A B C gr col_name
<dbl> <dbl> <dbl> <fct> <chr>
1 5 NA NA 1 A
2 NA 2 NA 1 A
3 NA NA 1 1 A
4 1 NA NA 2 all_equal
5 NA 1 NA 2 all_equal
6 NA NA 1 2 all_equal
7 3 NA NA 3 A&C
8 NA 1 NA 3 A&C
9 NA NA 3 3 A&C
The function should allow for a flexible number of columns to be input, as long as they're numeric. For example,
test %>%
group_by(gr) %>%
do(foo(., cols = 1:2))
and
test %>%
group_by(gr) %>%
do(foo(., cols = c(1,3)))
both seem to work.
Edit:
Yeah, I guess you can!
test %>%
group_by(gr) %>%
do(foo(., cols = c("A", "B", "C")))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With