I want to find all the names of columns with NA
or missing data and store these column names in a vector.
# create matrix
a <- c(1,2,3,4,5,NA,7,8,9,10,NA,12,13,14,NA,16,17,18,19,20)
cnames <- c("aa", "bb", "cc", "dd", "ee")
mymatrix <- matrix(a, nrow = 4, ncol = 5, byrow = TRUE)
colnames(mymatrix) <- cnames
mymatrix
# aa bb cc dd ee
# [1,] 1 2 3 4 5
# [2,] NA 7 8 9 10
# [3,] NA 12 13 14 NA
# [4,] 16 17 18 19 20
The desired result: columns "aa"
and "ee"
.
My attempt:
bad <- character()
for (j in 1:4){
tmp <- which(colnames(mymatrix[j, ]) %in% c("", "NA"))
bad <- tmp
}
However, I keep getting integer(0)
as my output. Any help is appreciated.
Extract rows/columns with missing values in specific columns/rows. You can use the isnull() or isna() method of pandas. DataFrame and Series to check if each element is a missing value or not. isnull() is an alias for isna() , whose usage is the same.
There are two easy methods to select columns of an R data frame without missing values, first one results in a vector and other returns a matrix. For example, if we have a data frame called df then the first method can be used as df[,colSums(is.na(df))==0] and the second method will be used as t(na. omit(t(df))).
Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data.
Like this?
colnames(mymatrix)[colSums(is.na(mymatrix)) > 0]
# [1] "aa" "ee"
Or as suggested by @thelatemail:
names(which(colSums(is.na(mymatrix)) > 0))
# [1] "aa" "ee"
R 3.1 introduced an anyNA
function, which is more convenient and faster:
colnames(mymatrix)[ apply(mymatrix, 2, anyNA) ]
Old answer:
If it's a very long matrix, apply
+ any
can short circuit and run a bit faster.
apply(is.na(mymatrix), 2, any)
# aa bb cc dd ee
# TRUE FALSE FALSE FALSE TRUE
colnames(mymatrix)[apply(is.na(mymatrix), 2, any)]
# [1] "aa" "ee"
If you have a data frame with non-numeric columns, this solution is more general (building on previous answers):
R 3.1 +
names(which(sapply(mymatrix, anyNA)))
or
names(which(sapply(mymatrix, function(x) any(is.na(x)))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With