To calculate the number of NAs in the entire data.frame, I can use sum(is.na(df)
, however, how can I count the number of NA in each column of a big data.frame? I tried apply(df, 2, function (x) sum(is.na(df$x))
but that didn't seem to work.
To find the sum of non-missing values in an R data frame column, we can simply use sum function and set the na. rm to TRUE. For example, if we have a data frame called df that contains a column say x which has some missing values then the sum of the non-missing values can be found by using the command sum(df$x,na.
To find the sum of every n values in R data frame columns, we can use rowsum function along with rep function that will repeat the sum for rows.
Thus, sum(is.na(x)) gives you the total number of missing values in x . To get the proportion of missing values you can proceed by dividing the result of the previous operation by the length of the input vector.
In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA's per column. Then, you use a function such as names() or colnames() to return the names of the columns with at least one missing value.
You could try:
colSums(is.na(df))
# V1 V2 V3 V4 V5
# 2 4 2 4 4
set.seed(42)
df <- as.data.frame(matrix(sample(c(NA,0:4), 5*20,replace=TRUE), ncol=5))
With dplyr
...
df %>%
summarise_all(funs(sum(is.na(.))))
or using the purrr
library
map(df, ~sum(is.na(.)))
You can use sapply
:
sapply(X = df, FUN = function(x) sum(is.na(x)))
Since the dplyr::summarise_all
function has been superseded by using across
inside the original function and dplyr::funs
has been deprecated, the current tidyverse approach would probably be something like:
df %>%
summarise(across(everything(), ~ sum(is.na(.x))))
To maintain the names of each column, use this variation (substitute name of dataframe for df in example):
apply(is.na(df), 2, sum)
You could try the following functions
Using colSums()
colSums(is.na(df))
Using apply()
apply(df, 2, function(x) {sum(is.na(x))})
Using a function
sum.na <- function (x) {
sum(is.na(x))
}
print(sum.na(df))
Using lapply()
lapply(df, function(x) sum(is.na(x)))
Using sapply()
lapply(df, function(x) sum(is.na(x)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With