Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering rows in dataframe based on partial column name and mathematical expression

Tags:

r

I'm trying to find a way to filter rows using both the partial name of the column headers and a mathematical expression (x > 0). Given my data here:

    OTU_ID       X3_22L15_S   X3_22T10_W   X3_22L6_S   X3_22Algae 
1   denovo109       16            0            9            0
2   denovo147       44          484           28            0
3   denovo297        0            0            7           14
4   denovo1013       0            1            0            0

I want to include only the rows that have a value > 0 in columns with headers that end in "S" AND columns that end in "W". I've found ways to filter the columns based on a substring of column headers and ways to filter based on values in a certain column, but have not found anything that allows me to filter based on values > 0 that occur simultaneously in two sets of columns specified by different substrings.

Any help would be appreciated!

like image 750
Emily Dodd Avatar asked Mar 09 '23 17:03

Emily Dodd


1 Answers

We can use filter_at from the dplyr package. dt2 is the final output.

# Load package
library(dplyr)

# Create example data frame
dt <- read.table(text = "    OTU_ID       X3_22L15_S   X3_22T10_W   X3_22L6_S   X3_22Algae 
1   denovo109       16            0            9            0
                 2   denovo147       44          484           28            0
                 3   denovo297        0            0            7           14
                 4   denovo1013       0            1            0            0",
                 header = TRUE, stringsAsFactors = FALSE)

# Filter the data, with any column ends with "S" or "W" and values > 0
dt2 <- dt %>%
  filter_at(vars(ends_with("S")), any_vars(. > 0)) %>%
  filter_at(vars(ends_with("W")), any_vars(. > 0))
like image 200
www Avatar answered Mar 11 '23 12:03

www