Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to output the columns with the maximum value

Tags:

r

dplyr

I have a data that I want to find out which column has the maximum value and output that column name. One issue is that if there is no maximum value (e.g, all numeric numbers equal) return all_equal comment or if two column has equal max value compared to the third one output those two column name.

Here is the sample data

test <- data.frame(A=c(5,NA,NA,1,NA,NA,3,NA,NA),B=c(NA,2,NA,NA,1,NA,NA,1,NA),C=c(NA,NA,1,NA,NA,1,NA,NA,3),gr=gl(3,3))

   A  B  C gr
1  5 NA NA  1
2 NA  2 NA  1
3 NA NA  1  1
4  1 NA NA  2
5 NA  1 NA  2
6 NA NA  1  2
7  3 NA NA  3
8 NA  1 NA  3
9 NA NA  3  3

In each gr there is values in column A, B and C. My purpose is to find which column has the maximum value in that group and output that column name to new column called col_name.

if all values are equal to each other as in gr=2 output is all_equal

if two of the column has max value comparing to the third column like in gr=3 output column names A&C to the col_name.

I realized that it might be difficult to build a pipeline without gather

so I tried;

library(dplyr)
test%>%
  group_by(gr)%>%

  gather(variable, value, -gr) %>%
  arrange(gr)%>%
  mutate(col_name=variable[which.max(value)])

# A tibble: 18 x 4
# Groups:   gr [2]
   r    variable value col_name
   <fct> <chr>    <dbl> <chr>   
 1 1     A            5 A       
 2 1     A           NA A       
 3 1     A           NA A       
 4 1     B           NA A       
 5 1     B            2 A       
 6 1     B           NA A       
 7 1     C           NA A       
 8 1     C           NA A       
 9 1     C            1 A       
10 2     A            1 A       
11 2     A           NA A       
12 2     A           NA A       
13 2     B           NA A       
14 2     B            1 A       
15 2     B           NA A       
16 2     C           NA A       
17 2     C           NA A       
18 2     C            1 A 

The problem I am struggling in here is how to output all_equal comment if all the max values are equal in columns A,B and C and

if 2 columns max value equals (A and C in gr=3) outputting those equal column names in this format A&C in col_name

The expected output would be

> test
       A  B  C gr  col_name
    1  5 NA NA  1     A
    2 NA  2 NA  1     A
    3 NA NA  1  1     A
    4  1 NA NA  2  all_equal
    5 NA  1 NA  2  all_equal 
    6 NA NA  1  2  all_equal
    7  3 NA NA  3  A&C
    8 NA  1 NA  3  A&C 
    9 NA NA  3  3  A&C

thx in advance!

like image 681
Alexander Avatar asked Aug 09 '18 22:08

Alexander


People also ask

How do I get the max value in a column in SQL?

The MAX() function returns the largest value of the selected column.

How do you find the maximum of a column?

To find the max value of a column, use the MAX() aggregate function; it takes as its argument the name of the column for which you want to find the maximum value. If you have not specified any other columns in the SELECT clause, the maximum will be calculated for all records in the table.

How do you find the maximum value of a DataFrame column?

Pandas DataFrame max() Method The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.

How do I get the highest value in 3 columns in SQL?

To get the maximum value from three different columns, use the GREATEST() function. Insert some records in the table using insert command. Display all records from the table using select statement.


1 Answers

Here is a dplyr approach that I tried to make a little more generalized to accommodate a different number of columns of interest. With your test data frame from above, start by defining a function that finds the max of the current group, gets indices for columns with matching values, then builds the output based on the number of matching columns:

foo <- function(df_, cols = 1:3) {
  # Get max
  m = max(df_[, cols], na.rm = TRUE)

  # Get columns
  ix <- as.data.frame(which(df_[, cols] == m, arr.ind = TRUE))[, 2]
  matchlen = length(ix)
  columns <- names(df_[,cols])[ix]

  # Get varname based on length
  out = ifelse(matchlen == length(cols), "all_equal", paste(columns, collapse = "&"))
  df_$col_name = out
  return(df_)
}

Because the output from that is a data frame, you need to make use of do to apply it to groups with dplyr:

test %>%
  group_by(gr) %>%
  do(foo(.))

# A tibble: 9 x 5
# Groups:   gr [3]
      A     B     C gr    col_name 
  <dbl> <dbl> <dbl> <fct> <chr>    
1     5    NA    NA 1     A        
2    NA     2    NA 1     A        
3    NA    NA     1 1     A        
4     1    NA    NA 2     all_equal
5    NA     1    NA 2     all_equal
6    NA    NA     1 2     all_equal
7     3    NA    NA 3     A&C      
8    NA     1    NA 3     A&C      
9    NA    NA     3 3     A&C 

The function should allow for a flexible number of columns to be input, as long as they're numeric. For example,

test %>%
  group_by(gr) %>%
  do(foo(., cols = 1:2))

and

test %>%
  group_by(gr) %>%
  do(foo(., cols = c(1,3)))

both seem to work.

Edit:

Yeah, I guess you can!

test %>%
  group_by(gr) %>%
  do(foo(., cols = c("A", "B", "C")))
like image 115
Luke C Avatar answered Sep 22 '22 02:09

Luke C