I have a dataframe that looks like this:
df_start <- data.frame(
a = c(1, 1, 1, 1, 1),
b = c(0, 1, 0, 0, 0),
c = c(1, 0, 0, 0, 0),
n = c(0, 0, 0, 1, 0))
I want to test the condition if any of the columns from df_start[,2:n] (where n shows the last column of the dataframe) are equal to df$a then create two new columns out of which the first one returns 1 if the condition is TRUE and 0 if it is not, and the other gives the name of the column for which the condition was TRUE.
I managed to create the first column like this:
library(dplyr)
# check condition
df_start <- df_start %>% mutate(cond = ifelse(a == b | a == c | a == n, 1, 0))
Even though I think I need a different approach since I may have different number of columns every time. So I need to test the condition for column a and all columns from the 2 to the last one but I also would need to know for which column the condition was fulfilled.
Desired output:
# desired output
df_end <- data.frame(a = c(1, 1, 1, 1, 1),
b = c(0, 1, 0, 0, 0),
c = c(1, 0, 0, 0, 0),
n = c(0, 0, 0, 1, 0),
cond = c(1,1,0,1,0),
col_name = c("c", "b", NA, "n", NA))
Is there a way to do this with dplyr maybe or base R ? Although any other solutions are appreciated.
Another base R solution:
m <- df_start[,1] == df_start[,2:4]
df_start$cond <- rowSums(m)
df_start$col_name[!!rowSums(m)] <- names(df_start[2:4])[max.col(m) * rowSums(m)]
which gives:
> df_start a b c n cond col_name 1 1 0 1 0 1 c 2 1 1 0 0 1 b 3 1 0 0 0 0 <NA> 4 1 0 0 1 1 n 5 1 0 0 0 0 <NA>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With