I'm looking for an algorithm to create a new column based on values from other columns AND respecting pre-established rules. Here's an example:
df = data.frame(
col_1 = c('No','Yes','Yes','Yes','Yes','Yes','No','No','No','Unknown'),
col_2 = c('Yes','Yes','Unknown','Yes','Unknown','No','Unknown','No','Unknown','Unknown'),
col_3 = c('Unknown','Yes','Yes','Unknown','Unknown','No','No','Unknown','Unknown','Unknown')
)
Current solution:
library(dplyr)
df_1 <-
df %>%
mutate(
new_column = ifelse(
(col_1 == 'Yes' | col_2 == 'Yes' | col_3 == 'Yes'), 'Yes',
ifelse(
(col_1 == 'Unknown' & col_2 == 'Unknown' & col_3 == 'Unknown'), 'Unknown','No'
)
)
)
After searching for StackOverflow, I couldn't find a way to my problem (I know there are several posts about creating a new column based on values obtained from different columns, but none). Perhaps the search strategy was not the best. If anyone finds it, please provide the link.
I used R in the code, but the current solution works in Python using np.where. Solutions in R or Python are welcome.
Try this using dplyr
rowwise
function
library(dplyr)
df |> rowwise() |> mutate(new_column = case_when(any(c_across() == "Yes") ~ "Yes" ,
any(c_across() == "No") ~ "No" , TRUE ~ "Unknown")) |> ungroup()
# A tibble: 10 × 4
col_1 col_2 col_3 new_column
<chr> <chr> <chr> <chr>
1 No Yes Unknown Yes
2 Yes Yes Yes Yes
3 Yes Unknown Yes Yes
4 Yes Yes Unknown Yes
5 Yes Unknown Unknown Yes
6 Yes No No Yes
7 No Unknown No No
8 No No Unknown No
9 No Unknown Unknown No
10 Unknown Unknown Unknown Unknown
df <- structure(list(col_1 = c("No", "Yes", "Yes", "Yes", "Yes", "Yes",
"No", "No", "No", "Unknown"), col_2 = c("Yes", "Yes", "Unknown",
"Yes", "Unknown", "No", "Unknown", "No", "Unknown", "Unknown"
), col_3 = c("Unknown", "Yes", "Yes", "Unknown", "Unknown", "No",
"No", "Unknown", "Unknown", "Unknown")), class = "data.frame", row.names = c(NA,
-10L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With