Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Piping the removal of empty columns using dplyr

Tags:

r

dplyr

I have a data frame of participant questionnaire responses in wide format, with each column representing a particular question/item.

The data frame looks something like this:

id <- c(1, 2, 3, 4)
Q1 <- c(NA, NA, NA, NA)
Q2 <- c(1, "", 4, 5)
Q3 <- c(NA, 2, 3, 4)
Q4 <- c("", "", 2, 2)
Q5 <- c("", "", "", "")
df <- data.frame(id, Q1, Q2, Q3, Q4, Q5)

I want R to remove columns that has all values in each of its rows that are either (1) NA or (2) blanks. Therefore, I do not want column Q1 (which comprises entirely of NAs) and column Q5 (which comprises entirely of blanks in the form of "").

According to this thread, I am able to use the following to remove columns that comprise entirely of NAs:

df[, !apply(is.na(df), 2, all]

However, that solution does not address blanks (""). As I am doing all of this in a dplyr pipe, could someone also explain how I could incorporate the above code into a dplyr pipe?

At this moment, my dplyr pipe looks like the following:

df <- df %>%
    select(relevant columns that I need)

After which, I'm stuck here and am using the brackets [] to subset the non-NA columns.

Thanks! Much appreciated.

like image 243
DTYK Avatar asked Mar 20 '18 01:03

DTYK


People also ask

How do I remove columns from dplyr in R?

dplyr select() function is used to select the column and by using negation of this to remove columns. All verbs in dplyr package take data.

How do I remove multiple columns in dplyr?

Drop multiple columns by using the column nameWhere, dataframe is the input dataframe and -c(column_names) is the collection of names of the column to be removed.

How do I remove columns of data in R?

The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.


1 Answers

We can use a version of select_if

library(dplyr)
df %>%
   select_if(function(x) !(all(is.na(x)) | all(x=="")))

#  id Q2 Q3 Q4
#1  1  1 NA   
#2  2     2   
#3  3  4  3  2
#4  4  5  4  2

Or without using an anonymous function call

df %>% select_if(~!(all(is.na(.)) | all(. == "")))

You can also modify your apply statement as

df[!apply(df, 2, function(x) all(is.na(x)) | all(x==""))]

Or using colSums

df[colSums(is.na(df) | df == "") != nrow(df)]

and inverse

df[colSums(!(is.na(df) | df == "")) > 0]
like image 200
Ronak Shah Avatar answered Sep 28 '22 08:09

Ronak Shah