Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check condition and return name of column for which the condition is fulfilled

Tags:

r

filter

dplyr

I have a dataframe that looks like this:

df_start <- data.frame(
  a = c(1, 1, 1, 1, 1), 
  b = c(0, 1, 0, 0, 0), 
  c = c(1, 0, 0, 0, 0), 
  n = c(0, 0, 0, 1, 0))

I want to test the condition if any of the columns from df_start[,2:n] (where n shows the last column of the dataframe) are equal to df$a then create two new columns out of which the first one returns 1 if the condition is TRUE and 0 if it is not, and the other gives the name of the column for which the condition was TRUE.

I managed to create the first column like this:

library(dplyr)

# check condition
df_start <- df_start %>% mutate(cond = ifelse(a == b | a == c | a == n, 1, 0))

Even though I think I need a different approach since I may have different number of columns every time. So I need to test the condition for column a and all columns from the 2 to the last one but I also would need to know for which column the condition was fulfilled.

Desired output:

# desired output
df_end <- data.frame(a = c(1, 1, 1, 1, 1), 
                     b = c(0, 1, 0, 0, 0), 
                     c = c(1, 0, 0, 0, 0), 
                     n = c(0, 0, 0, 1, 0),
                     cond = c(1,1,0,1,0),
                     col_name = c("c", "b", NA, "n", NA))

Is there a way to do this with dplyr maybe or base R ? Although any other solutions are appreciated.

like image 484
adl Avatar asked Jan 29 '26 09:01

adl


1 Answers

Another base R solution:

m <- df_start[,1] == df_start[,2:4]

df_start$cond <- rowSums(m)
df_start$col_name[!!rowSums(m)] <- names(df_start[2:4])[max.col(m) * rowSums(m)]

which gives:

> df_start
  a b c n cond col_name
1 1 0 1 0    1        c
2 1 1 0 0    1        b
3 1 0 0 0    0     <NA>
4 1 0 0 1    1        n
5 1 0 0 0    0     <NA>
like image 194
Jaap Avatar answered Jan 30 '26 23:01

Jaap