Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R get rows based on multiple conditions - use dplyr and reshape2

df <- data.frame(
    exp=c(1,1,2,2),
  name=c("gene1", "gene2", "gene1", "gene2"),
    value=c(1,1,3,-1)
    )

In trying to get customed to the dplyr and reshape2I stumbled over a "simple" way to select rows based on several conditions. If I want to have those genes (the namevariable) that have valueabove 0 in experiment 1 (exp== 1) AND at the same time valuebelow 0 in experiment 2; in df this would be "gene2". Sure there must be many ways to this, e.g. subset df for each set of conditions (exp==1 & value > 0, and exp==2 and value < 0) and then join the results of these subset:

library(dplyr)    
inner_join(filter(df,exp == 1 & value > 0),filter(df,exp == 2 & value < 0), by= c("name"="name"))[[1]]

Although this works it looks very akward, and I feel that such conditioned filtering lies at the heart of reshape2 and dplyr but cannot figure out how to do this. Can someone enlighten me here?

like image 906
user3375672 Avatar asked Dec 01 '14 15:12

user3375672


People also ask

Can you filter by two conditions in R?

Filter Using Multiple Conditions in R, Using the dplyr package, you can filter data frames by several conditions using the following syntax. Method 1: Using OR, filter by many conditions. With the following data frame in R, the following example explains how to apply these methods in practice.

How do I use two conditions in R?

Multiple conditions can also be combined using which() method in R. The which() function in R returns the position of the value which satisfies the given condition. The %in% operator is used to check a value in the vector specified.

How do I filter multiple values in R?

In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.


1 Answers

One alternative that comes to mind is to transform the data to a "wide" format and then do the filtering.

Here's an example using "data.table" (for the convenience of compound-statements):

library(data.table)
dcast.data.table(as.data.table(df), name ~ exp)[`1` > 0 & `2` < 0]
#     name 1  2
# 1: gene2 1 -1

Similarly, with "dplyr" and "tidyr":

library(dplyr)
library(tidyr)
df %>% 
  spread(exp, value) %>% 
  filter(`1` > 0 & `2` < 0)
like image 96
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 15 '22 07:11

A5C1D2H2I1M1N2O1R2T1