Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tidyverse: case_when() not returning correct value

For a set of columns, I wish to detect which column has the max value. If the value between columns is equal, then I would like to use a weight to decide which column to select. I've tried implementing this using case_when() however it does not work. I will show this problem below using an example dataset...

Let's say I have a dataset which includes these three columns (A, B, C) that correspond to the amount of Apple, Bananas or Carrots a child eats per day.

For row I want to log the most consumed food (i.e., column with the highest value). If the value between any column is equal (e.g., 1 apple and 1 banana), then apply the following rank. Apple > Banana > Carrot, whereby if a child eats 1 Apple and 1 Banana, the log will show Apple.

I've tried implementing this in R using by using pairs of if_else statements with case_when(). However, it does not return the correct result. For example, the final row should be classified as Apple, not carrot. I'm not sure what I'm doing wrong. Please provide a Tidyverse solution to this issue and if possible, explain why my approach did not work.

library(tidyverse)

A <- c(1,1,3,3)
B <- c(2,3,1,1)
C <- c(1,1,1,2)
df <- data.frame(A,B,C)

top_food <- df %>% 
  mutate(highest = case_when(
    C > B ~ "carrot", # if carrot > banana
    C > A ~ "carrot", # if carrot > apple 
    B > A ~ "banana", # if banana > apple 
    B >= A ~ "banana", # if banana >= carrot
    A >= B ~ "apple", # if apple  >= banana
    A >= C ~ "apple" # if apple >= carrot
  )) 

> | A | B | C | HIGHEST |  |
> | 1 | 2 | 1 | banana  |  |
> | 1 | 3 | 1 | banana  |  |
> | 3 | 1 | 1 | apple   |  |
>   3 | 1 | 2 | carrot  |  |

Notes: - this is an example dataset. - I'm open to solutions with different functions but please provide Tidyverse answers as this is how I am learning R. If possible, please explain why my approach using case_when() did not work so I can learn. - It's important to maintain the shape/layout of the dataset so it can be used in software outside of R, so please do not convert to long format.

like image 828
EdyK Avatar asked May 23 '26 02:05

EdyK


2 Answers

A quick fix could contain dplyr::rowise and which.max(). The downside to this approach is that it is quite slow. Additionally I am assuming that the order of your columns reflects the rank of fruits (if there is a tie, which.max will return the first value).

A <- c(1,1,3,3,1,1)
B <- c(2,3,1,1,1,2)
C <- c(1,1,1,2,1,2)

df <- data.frame(A,B,C)
labels <- c("apple","banana","carrot")

df %>%
    dplyr::rowwise() %>%
    dplyr::mutate(top=labels[which.max(c(A,B,C))]) %>%
    dplyr::ungroup()

Another (and probably better) approach could be using max.col()

df <- data.frame(A,B,C)
labels <- c("apple","banana","carrot")
df %>%
    dplyr::mutate(top=labels[max.col(df,ties="first")])

And yet another solution independent of your column order (now I am out :D):

df <- data.frame(A,B,C)

top_food <- df %>% 
    dplyr::mutate(highest = dplyr::case_when(
        C > A & C > B ~ "carrot",
        B > A & B >= C ~ "banana",
        TRUE ~ "apple"))

your approach with case_when did not work, because case_when stops checking conditions as soon as a true condition is found. As your first condition is C>B ~ "carriot" which is true for your last row (2>1), case_when returned "carrot" and did not check the other conditions.

like image 148
sambold Avatar answered May 26 '26 05:05

sambold


One solution to your problem (I modified the data a bit to show that it works if all values are the same):

library(tidyverse)
df %>% 
  rowid_to_column() %>% 
  pivot_longer(-rowid) %>% 
  group_by(rowid) %>% 
  mutate(highest = name[value == max(value)],
         highest = case_when(var(value) == 0 & value == 1 ~ "apple",
                         var(value) == 0 & value == 2 ~ "banana",
                         var(value) == 0 & value == 3 ~ "carrot",
                         highest == "A" ~ "apple",
                         highest == "B"~ "banana",
                         highest == "C" ~ "carrot")) %>% 
  pivot_wider(names_from = name, values_from = value)

# A tibble: 4 x 5
# Groups:   rowid [4]
  rowid highest     A     B     C
  <int> <chr>   <dbl> <dbl> <dbl>
1     1 apple       1     1     1
2     2 banana      1     3     1
3     3 apple       3     1     1
4     4 apple       3     1     2

Data

A <- c(1,1,3,3)
B <- c(1,3,1,1)
C <- c(1,1,1,2)
df <- data.frame(A,B,C)
df
like image 28
Dominik S. Meier Avatar answered May 26 '26 05:05

Dominik S. Meier



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!