Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find columns with all missing values

Tags:

dataframe

r

na

I am writing a function, which needs a check on whether (and which!) column (variable) has all missing values (NA, <NA>). The following is fragment of the function:

test1 <- data.frame (matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3)) test2 <- data.frame (matrix(c(1,2,3,NA,NA,NA,NA,NA,2), 3,3))  na.test <-  function (data) {   if (colSums(!is.na(data) == 0)){       stop ("The some variable in the dataset has all missing value,      remove the column to proceed")       }       } na.test (test1)  Warning message: In if (colSums(!is.na(data) == 0)) { :   the condition has length > 1 and only the first element will be used 

Q1: Why is the above error and any fixes ?

Q2: Is there any way to find which of columns have all NA, for example output the list (name of variable or column number)?

like image 289
SHRram Avatar asked Jul 04 '12 13:07

SHRram


People also ask

How do you find missing columns with data?

Extract rows/columns with missing values in specific columns/rows. You can use the isnull() or isna() method of pandas. DataFrame and Series to check if each element is a missing value or not. isnull() is an alias for isna() , whose usage is the same.

How do I find missing values in all columns in R?

Instead of using the is.na() and colSums() functions, we use the apply() function and the anyNA function. The apply() function scans through all columns and carries out a specific operation. In our case, the operation is to find missing values. Therefore, we can use the anyNA function.

How do I see all null columns in Pandas?

You can use df. isnull(). sum() . It shows all columns and the total NaNs of each feature.


2 Answers

This is easy enough to with sapply and a small anonymous function:

sapply(test1, function(x)all(is.na(x)))    X1    X2    X3  FALSE FALSE FALSE   sapply(test2, function(x)all(is.na(x)))    X1    X2    X3  FALSE  TRUE FALSE  

And inside a function:

na.test <-  function (x) {   w <- sapply(x, function(x)all(is.na(x)))   if (any(w)) {     stop(paste("All NA in columns", paste(which(w), collapse=", ")))   } }  na.test(test1)  na.test(test2) Error in na.test(test2) : All NA in columns 2 
like image 69
Andrie Avatar answered Sep 19 '22 20:09

Andrie


In dplyr

ColNums_NotAllMissing <- function(df){ # helper function   as.vector(which(colSums(is.na(df)) != nrow(df))) }  df %>% select(ColNums_NotAllMissing(.))  example: x <- data.frame(x = c(NA, NA, NA), y = c(1, 2, NA), z = c(5, 6, 7))  x %>% select(ColNums_NotAllMissing(.)) 

or, the other way around

Cols_AllMissing <- function(df){ # helper function   as.vector(which(colSums(is.na(df)) == nrow(df))) }   x %>%   select(-Cols_AllMissing(.)) 
like image 34
Tony Ladson Avatar answered Sep 20 '22 20:09

Tony Ladson