I am trying to select columns where at least one row equals 1, only if the same row also has a certain value in a second column. I would prefer to achieve this using dplyr, but any computationally efficient solution is welcome.
Example:
Select columns among a1, a2, a3 containing at least one row where the value is 1 AND where column b=="B"
Example data:
rand <- function(S) {set.seed(S); sample(x = c(0,1),size = 3, replace=T)}
df <- data.frame(a1=rand(1),a2=rand(2),a3=rand(3),b=c("A","B","A"))
Input data:
a1 a2 a3 b
1 0 0 0 A
2 0 1 1 B
3 1 1 0 A
Desired output:
a2 a3
1 0 0
2 1 1
3 1 0
I managed to obtain the correct output with the following code, however this is a very inefficient solution and I need to run it on a very large dataframe (365,000 rows X 314 columns).
df %>% select_if(function(x) any(paste0(x,.$b) == '1B'))
A solution, not using dplyr:
df[sapply(df[df$b == "B",], function(x) 1 %in% x)]
Here is my dplyr
solution:
ids <- df %>%
reshape2::melt(id.vars = "b") %>%
filter(value == 1 & b == "B") %>%
select(variable)
df[,unlist(ids)]
# a2 a3
#1 0 0
#2 1 1
#3 1 0
As suggested by @docendo-discimus it is easier to convert to long format
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With