I have the following data frame:
dat <- structure(list(`A-XXX` = c(1.51653275922944, 0.077037240321129,
0), `fBM-XXX` = c(2.22875185527511, 0, 0), `P-XXX` = c(1.73356698481106,
0, 0), `vBM-XXX` = c(3.00397859609183, 0, 0)), .Names = c("A-XXX",
"fBM-XXX", "P-XXX", "vBM-XXX"), row.names = c("BATF::JUN_AHR",
"BATF::JUN_CCR9", "BATF::JUN_IL10"), class = "data.frame")
dat
#> A-XXX fBM-XXX P-XXX vBM-XXX
#> BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#> BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
#> BATF::JUN_IL10 0.00000000 0.000000 0.000000 0.000000
I can remove the row with all column zero with this command:
> dat <- dat[ rowSums(dat)!=0, ]
> dat
A-XXX fBM-XXX P-XXX vBM-XXX
BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
But how can I do it with dplyr's pipe style?
Data Visualization using R Programming For example, if we have a data frame called df then we can remove rows that contain at least one 0 can be done by using the command df[apply(df,1, function(x) all(x!= 0)),].
If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset). Furthermore, we can also use the function slice() from dplyr to remove rows based on the index.
To remove observations with missing values in at least one column, you can use the na. omit() function. The na. omit() function in the R language inspects all columns from a data frame and drops rows that have NA's in one or more columns.
Here's a dplyr option:
library(dplyr)
filter_all(dat, any_vars(. != 0))
# A-XXX fBM-XXX P-XXX vBM-XXX
#1 1.51653276 2.228752 1.733567 3.003979
#2 0.07703724 0.000000 0.000000 0.000000
Here we make use of the logic that if any variable is not equal to zero, we will keep it. It's the same as removing rows where all variables are equal to zero.
Regarding row.names:
library(tidyverse)
dat %>% rownames_to_column() %>% filter_at(vars(-rowname), any_vars(. != 0))
# rowname A-XXX fBM-XXX P-XXX vBM-XXX
#1 BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#2 BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
Adding to the answer by @mgrund, a shorter alternative with dplyr 1.0.0 is:
# Option A:
data %>% filter(across(everything(.)) != 0))
# Option B:
data %>% filter(across(everything(.), ~. == 0))
Explanation: across()
checks for every tidy_select variable, which is everything()
representing every column. In Option A, every column is checked if not zero, which adds up to a complete row of zeros in every column. In Option B, on every column, the formula (~) is applied which checks if the current column is zero.
EDIT:
As filter
already checks by row, you don't need rowwise()
. This is different for select
or mutate
.
IMPORTANT:
In Option A, it is crucial to write across(everything(.)) != 0
,
and NOT
across(everything(.) != 0))
!
Reason: across
requires a tidyselect variable (here everything()
), not a boolean (which would be everything(.) != 0)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With