How do I remove NAs and empty values by column names?

Question

I have the following data frame:

a <- c(3, 2, 1)
a_1 <- c(NA, "", NA)
b <- c(3, 4, 1)
b_1 <- c(3, NA, 4)
c <- c("", "", "")
c_1 <- c(5, 8, 9)
d <- c(6, 9, 10)
d_1 <- c("", "", "")
e <- c(NA, NA, NA)
e_1 <- c(NA, NA, NA)

df <- data.frame(a, a_1, b, b_1, c, c_1, 
                 d, d_1, e, e_1)

I want to remove columns that contain "_1" with only empty cells and NAs. However, I've only been able to find the code that removes ALL columns in the data frame that contain empty cells and NAs.

empty_columns <- colSums(is.na(df) | 
        df == "") == nrow(df)
df[, !empty_columns]
df <- df[, colSums(is.na(df)) < nrow(df)]

  a b b_1 c_1  d
1 3 3   3   5  6
2 2 4  NA   8  9
3 1 1   4   9 10

But I want my resulting data frame to look like this:

df2 <- data.frame(a, b, b_1, c, c_1, d, e)

  a b b_1 c c_1  d  e
1 3 3   3     5  6 NA
2 2 4  NA     8  9 NA
3 1 1   4     9 10 NA

Andrew Gustar · Accepted Answer

You could redefine empty_columns to check whether the column name contains "_1"...

empty_columns <- colSums(is.na(df) | df == "") == nrow(df) & grepl("_1", names(df))

akrun · Answer

Here is one option with tidyverse, select the columns that ends_with ("_1") and (&) those have all NA values (after converting the blanks ("") to NA with na_if, then remove those columns with -

library(dplyr)
df %>%
    select(-(ends_with("_1") & where(~ all(is.na(na_if(as.character(.x), ""))))))

-output

  a b b_1 c c_1  d  e
1 3 3   3     5  6 NA
2 2 4  NA     8  9 NA
3 1 1   4     9 10 NA

How do I remove NAs and empty values by column names?

Tags:

dataframe

r

missing-data

hy9fesh

2 Answers

Andrew Gustar

akrun

Recent Activity

Donate For Us

How do I remove NAs and empty values by column names?

Tags:

dataframe

r

missing-data

hy9fesh

2 Answers

Andrew Gustar

akrun

Related questions

Recent Activity

Donate For Us