I am writing a function, which needs a check on whether (and which!) column (variable) has all missing values (NA
, <NA>
). The following is fragment of the function:
test1 <- data.frame (matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3)) test2 <- data.frame (matrix(c(1,2,3,NA,NA,NA,NA,NA,2), 3,3)) na.test <- function (data) { if (colSums(!is.na(data) == 0)){ stop ("The some variable in the dataset has all missing value, remove the column to proceed") } } na.test (test1) Warning message: In if (colSums(!is.na(data) == 0)) { : the condition has length > 1 and only the first element will be used
Q1: Why is the above error and any fixes ?
Q2: Is there any way to find which of columns have all NA
, for example output the list (name of variable or column number)?
Extract rows/columns with missing values in specific columns/rows. You can use the isnull() or isna() method of pandas. DataFrame and Series to check if each element is a missing value or not. isnull() is an alias for isna() , whose usage is the same.
Instead of using the is.na() and colSums() functions, we use the apply() function and the anyNA function. The apply() function scans through all columns and carries out a specific operation. In our case, the operation is to find missing values. Therefore, we can use the anyNA function.
You can use df. isnull(). sum() . It shows all columns and the total NaNs of each feature.
This is easy enough to with sapply
and a small anonymous function:
sapply(test1, function(x)all(is.na(x))) X1 X2 X3 FALSE FALSE FALSE sapply(test2, function(x)all(is.na(x))) X1 X2 X3 FALSE TRUE FALSE
And inside a function:
na.test <- function (x) { w <- sapply(x, function(x)all(is.na(x))) if (any(w)) { stop(paste("All NA in columns", paste(which(w), collapse=", "))) } } na.test(test1) na.test(test2) Error in na.test(test2) : All NA in columns 2
In dplyr
ColNums_NotAllMissing <- function(df){ # helper function as.vector(which(colSums(is.na(df)) != nrow(df))) } df %>% select(ColNums_NotAllMissing(.)) example: x <- data.frame(x = c(NA, NA, NA), y = c(1, 2, NA), z = c(5, 6, 7)) x %>% select(ColNums_NotAllMissing(.))
or, the other way around
Cols_AllMissing <- function(df){ # helper function as.vector(which(colSums(is.na(df)) == nrow(df))) } x %>% select(-Cols_AllMissing(.))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With