Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter by testing logical condition across multiple columns

Tags:

r

dplyr

Is there a function in dplyr that allows you to test the same condition against a selection of columns?

Take the following dataframe:

Demo1 <- c(8,9,10,11)
Demo2 <- c(13,14,15,16)
Condition <- c('A', 'A', 'B', 'B')
Var1 <- c(13,76,105,64)
Var2 <- c(12,101,23,23)
Var3 <- c(5,5,5,5)

df <- as.data.frame(cbind(Demo1, Demo2, Condition, Var1, Var2, Var3), stringsAsFactors = F)
df[4:6] <- lapply(df[4:6], as.numeric)

I want to take all the rows in which there is at least one value greater than 100 in any of Var1, Var2, or Var3. I realise that I could do this with a series of or statements, like so:

df <- df %>% 
  filter(Var1 > 100 | Var2 > 100 | Var3 > 100)

However, since I have quite a few columns in my actual dataset this would be time-consuming. I am assuming that there is some reasonably straightforward way to do this but haven't been able to find a solution on SO.

like image 675
userLL Avatar asked May 24 '18 06:05

userLL


2 Answers

We can do this with filter_at and any_vars

df %>% 
  filter_at(vars(matches("^Var")), any_vars(.> 100))
#   Demo1 Demo2 Condition Var1 Var2 Var3
#1     9    14         A   76  101    5
#2    10    15         B  105   23    5

Or using base R, create a logical expression with lapply and Reduce and subset the rows

df[Reduce(`|`, lapply(df[grepl("^Var", names(df))], `>`, 100)),]
like image 85
akrun Avatar answered Oct 27 '22 18:10

akrun


In base-R one can write the same filter using rowSums as:

df[rowSums((df[,grepl("^Var",names(df))] > 100)) >= 1, ]

#   Demo1 Demo2 Condition Var1 Var2 Var3
# 2     9    14         A   76  101    5
# 3    10    15         B  105   23    5
like image 21
MKR Avatar answered Oct 27 '22 18:10

MKR