Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove rows where all columns are zero using dplyr pipe

Tags:

r

dplyr

tidyverse

I have the following data frame:

dat <- structure(list(`A-XXX` = c(1.51653275922944, 0.077037240321129, 
0), `fBM-XXX` = c(2.22875185527511, 0, 0), `P-XXX` = c(1.73356698481106, 
0, 0), `vBM-XXX` = c(3.00397859609183, 0, 0)), .Names = c("A-XXX", 
"fBM-XXX", "P-XXX", "vBM-XXX"), row.names = c("BATF::JUN_AHR", 
"BATF::JUN_CCR9", "BATF::JUN_IL10"), class = "data.frame")

dat 
#>                     A-XXX  fBM-XXX    P-XXX  vBM-XXX
#> BATF::JUN_AHR  1.51653276 2.228752 1.733567 3.003979
#> BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
#> BATF::JUN_IL10 0.00000000 0.000000 0.000000 0.000000

I can remove the row with all column zero with this command:

> dat <- dat[ rowSums(dat)!=0, ]
> dat
                    A-XXX  fBM-XXX    P-XXX  vBM-XXX
BATF::JUN_AHR  1.51653276 2.228752 1.733567 3.003979
BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000

But how can I do it with dplyr's pipe style?

like image 733
scamander Avatar asked Mar 15 '18 06:03

scamander


People also ask

How do I remove rows with zero values in R?

Data Visualization using R Programming For example, if we have a data frame called df then we can remove rows that contain at least one 0 can be done by using the command df[apply(df,1, function(x) all(x!= 0)),].

How do I remove rows with dplyr in R?

If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset). Furthermore, we can also use the function slice() from dplyr to remove rows based on the index.

How do I remove Na values from all columns in R?

To remove observations with missing values in at least one column, you can use the na. omit() function. The na. omit() function in the R language inspects all columns from a data frame and drops rows that have NA's in one or more columns.


2 Answers

Here's a dplyr option:

library(dplyr)
filter_all(dat, any_vars(. != 0))

#       A-XXX  fBM-XXX    P-XXX  vBM-XXX
#1 1.51653276 2.228752 1.733567 3.003979
#2 0.07703724 0.000000 0.000000 0.000000

Here we make use of the logic that if any variable is not equal to zero, we will keep it. It's the same as removing rows where all variables are equal to zero.

Regarding row.names:

library(tidyverse)
dat %>% rownames_to_column() %>% filter_at(vars(-rowname), any_vars(. != 0))
#         rowname      A-XXX  fBM-XXX    P-XXX  vBM-XXX
#1  BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#2 BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
like image 153
talat Avatar answered Oct 06 '22 22:10

talat


Adding to the answer by @mgrund, a shorter alternative with dplyr 1.0.0 is:

# Option A:
data %>% filter(across(everything(.)) != 0))

# Option B:
data %>% filter(across(everything(.), ~. == 0))

Explanation:
across() checks for every tidy_select variable, which is everything() representing every column. In Option A, every column is checked if not zero, which adds up to a complete row of zeros in every column. In Option B, on every column, the formula (~) is applied which checks if the current column is zero.

EDIT:
As filter already checks by row, you don't need rowwise(). This is different for select or mutate.

IMPORTANT:
In Option A, it is crucial to write across(everything(.)) != 0,
and NOT across(everything(.) != 0))!

Reason:
across requires a tidyselect variable (here everything()), not a boolean (which would be everything(.) != 0))

like image 26
Agile Bean Avatar answered Oct 06 '22 22:10

Agile Bean