I've got a database query running from R which can have multiple columns filled with NA, some of these columns I need further down the line but one specific column can be dropped if all values are NA.
I usually use purrr::discard(~all(is.na(.))) to drop the columns which are all NA, but since this dataframe can contain multiple columns with NA where I only want to remove one I'm struggling to make this specific to the column in a tidyverse solution.
I've currently got this workaround:
if(sum(is.na(Orders$Originator)) == nrow(Orders)) {
Orders <- Orders %>%
select(-Originator)
}
But it would improve the readability if I can have this in a tidyverse solution. Hope someone can be of help!
Thanks!
The canonical tidyverse way to address this problem would be to make use of a predicate function used within select(where(...))
and combine this with selection by variable name.
First we could write a custom predicate function to use in where
which selects only columns that contain only NA
s.
# custom predicate function
all_na <- function(x) {
all(is.na(x))
}
We can use this function together with a boolean expression saying we don't want to select
y
if (read AND &
) it is all_na
:
library(dplyr)
df <- data.frame(
x = c(1,2,NA),
y = NA,
z = c(3,4,5)
)
df %>%
select(!(y & where(all_na)))
#> x z
#> 1 1 3
#> 2 2 4
#> 3 NA 5
To check whether this is really working let's redefine y
so that it contains not only NA
s and we will see that this time it's not deselected:
df2 <- data.frame(
x = c(1,2,NA),
y = c(1,2,NA),
z = c(3,4,5)
)
df2 %>%
select(!(y & where(all_na)))
#> x y z
#> 1 1 1 3
#> 2 2 2 4
#> 3 NA NA 5
Instead of a custom function we can use a lambda function inside where
:
df %>%
select(!(y & where(~ all(is.na(.x)))))
Created on 2021-12-07 by the reprex package (v0.3.0)
In the larger tidyverse we could also use purrr::lmap_at
and select y
with the .at
argument and then create a lambda function saying if all(is.na(.x))
then use an empty list()
(= drop the column) otherwise keep the column .x
:
library(purrr)
library(dplyr)
df %>%
lmap_at("y", ~ if(all(is.na(.x))) list() else .x)
#> # A tibble: 3 x 2
#> x z
#> <dbl> <dbl>
#> 1 1 3
#> 2 2 4
#> 3 NA 5
Created on 2021-12-07 by the reprex package (v2.0.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With