Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop a column if all of its values are NA

Tags:

r

tidyverse

I've got a database query running from R which can have multiple columns filled with NA, some of these columns I need further down the line but one specific column can be dropped if all values are NA.

I usually use purrr::discard(~all(is.na(.))) to drop the columns which are all NA, but since this dataframe can contain multiple columns with NA where I only want to remove one I'm struggling to make this specific to the column in a tidyverse solution.

I've currently got this workaround:

  if(sum(is.na(Orders$Originator)) == nrow(Orders)) {
    
    Orders <- Orders %>%
      select(-Originator)
    
  }

But it would improve the readability if I can have this in a tidyverse solution. Hope someone can be of help!

Thanks!

like image 688
Bob Janson Avatar asked Sep 20 '25 23:09

Bob Janson


1 Answers

The canonical tidyverse way to address this problem would be to make use of a predicate function used within select(where(...)) and combine this with selection by variable name.

First we could write a custom predicate function to use in where which selects only columns that contain only NAs.

# custom predicate function
all_na <- function(x) {
  all(is.na(x))
}

We can use this function together with a boolean expression saying we don't want to select y if (read AND &) it is all_na:

library(dplyr)

df <- data.frame(
  x = c(1,2,NA),
  y = NA,
  z = c(3,4,5)
)

df %>% 
  select(!(y & where(all_na)))
#>    x z
#> 1  1 3
#> 2  2 4
#> 3 NA 5

To check whether this is really working let's redefine y so that it contains not only NAs and we will see that this time it's not deselected:

df2 <- data.frame(
  x = c(1,2,NA),
  y = c(1,2,NA),
  z = c(3,4,5)
)

df2 %>% 
  select(!(y & where(all_na)))
#>    x  y z
#> 1  1  1 3
#> 2  2  2 4
#> 3 NA NA 5

Instead of a custom function we can use a lambda function inside where:

df %>% 
  select(!(y & where(~ all(is.na(.x)))))

Created on 2021-12-07 by the reprex package (v0.3.0)


In the larger tidyverse we could also use purrr::lmap_at and select y with the .at argument and then create a lambda function saying if all(is.na(.x)) then use an empty list() (= drop the column) otherwise keep the column .x:

library(purrr)
library(dplyr)

df %>% 
  lmap_at("y", ~ if(all(is.na(.x))) list() else .x)
#> # A tibble: 3 x 2
#>       x     z
#>   <dbl> <dbl>
#> 1     1     3
#> 2     2     4
#> 3    NA     5

Created on 2021-12-07 by the reprex package (v2.0.1)

like image 200
TimTeaFan Avatar answered Sep 22 '25 14:09

TimTeaFan