I have a data frame of participant questionnaire responses in wide format, with each column representing a particular question/item.
The data frame looks something like this:
id <- c(1, 2, 3, 4)
Q1 <- c(NA, NA, NA, NA)
Q2 <- c(1, "", 4, 5)
Q3 <- c(NA, 2, 3, 4)
Q4 <- c("", "", 2, 2)
Q5 <- c("", "", "", "")
df <- data.frame(id, Q1, Q2, Q3, Q4, Q5)
I want R to remove columns that has all values in each of its rows that are either (1) NA or (2) blanks. Therefore, I do not want column Q1 (which comprises entirely of NAs) and column Q5 (which comprises entirely of blanks in the form of "").
According to this thread, I am able to use the following to remove columns that comprise entirely of NAs:
df[, !apply(is.na(df), 2, all]
However, that solution does not address blanks (""). As I am doing all of this in a dplyr pipe, could someone also explain how I could incorporate the above code into a dplyr pipe?
At this moment, my dplyr pipe looks like the following:
df <- df %>%
select(relevant columns that I need)
After which, I'm stuck here and am using the brackets [] to subset the non-NA columns.
Thanks! Much appreciated.
dplyr select() function is used to select the column and by using negation of this to remove columns. All verbs in dplyr package take data.
Drop multiple columns by using the column nameWhere, dataframe is the input dataframe and -c(column_names) is the collection of names of the column to be removed.
The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.
We can use a version of select_if
library(dplyr)
df %>%
select_if(function(x) !(all(is.na(x)) | all(x=="")))
# id Q2 Q3 Q4
#1 1 1 NA
#2 2 2
#3 3 4 3 2
#4 4 5 4 2
Or without using an anonymous function call
df %>% select_if(~!(all(is.na(.)) | all(. == "")))
You can also modify your apply
statement as
df[!apply(df, 2, function(x) all(is.na(x)) | all(x==""))]
Or using colSums
df[colSums(is.na(df) | df == "") != nrow(df)]
and inverse
df[colSums(!(is.na(df) | df == "")) > 0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With