Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: how to total the number of NA in each col of data.frame

Tags:

r

To calculate the number of NAs in the entire data.frame, I can use sum(is.na(df), however, how can I count the number of NA in each column of a big data.frame? I tried apply(df, 2, function (x) sum(is.na(df$x)) but that didn't seem to work.

like image 896
Adrian Avatar asked Oct 09 '14 08:10

Adrian


People also ask

How do you find total NA values in R?

To find the sum of non-missing values in an R data frame column, we can simply use sum function and set the na. rm to TRUE. For example, if we have a data frame called df that contains a column say x which has some missing values then the sum of the non-missing values can be found by using the command sum(df$x,na.

How do you find the sum of all N values in R data frame columns?

To find the sum of every n values in R data frame columns, we can use rowsum function along with rep function that will repeat the sum for rows.

How do I count missing values in a Dataframe in R?

Thus, sum(is.na(x)) gives you the total number of missing values in x . To get the proportion of missing values you can proceed by dividing the result of the previous operation by the length of the input vector.

How do I find NA for all columns in R?

In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA's per column. Then, you use a function such as names() or colnames() to return the names of the columns with at least one missing value.


6 Answers

You could try:

colSums(is.na(df))
#  V1 V2 V3 V4 V5 
#   2  4  2  4  4 

data

set.seed(42)
df <- as.data.frame(matrix(sample(c(NA,0:4), 5*20,replace=TRUE), ncol=5))
like image 104
akrun Avatar answered Oct 19 '22 04:10

akrun


With dplyr...

df %>%
  summarise_all(funs(sum(is.na(.))))

or using the purrr library

map(df, ~sum(is.na(.)))
like image 31
Nettle Avatar answered Oct 19 '22 04:10

Nettle


You can use sapply :

sapply(X = df, FUN = function(x) sum(is.na(x)))
like image 5
Victorp Avatar answered Oct 19 '22 02:10

Victorp


Since the dplyr::summarise_all function has been superseded by using across inside the original function and dplyr::funs has been deprecated, the current tidyverse approach would probably be something like:

df %>% 
  summarise(across(everything(), ~ sum(is.na(.x))))
like image 10
climatestudent Avatar answered Oct 19 '22 02:10

climatestudent


To maintain the names of each column, use this variation (substitute name of dataframe for df in example):

apply(is.na(df), 2, sum)
like image 2
dwolf Avatar answered Oct 19 '22 02:10

dwolf


You could try the following functions

  1. Using colSums()

    colSums(is.na(df))

  2. Using apply()

    apply(df, 2, function(x) {sum(is.na(x))})

  3. Using a function

    sum.na <- function (x) { sum(is.na(x)) }

    print(sum.na(df))

  4. Using lapply()

    lapply(df, function(x) sum(is.na(x)))

  5. Using sapply()

    lapply(df, function(x) sum(is.na(x)))

like image 2
thisisadi Avatar answered Oct 19 '22 04:10

thisisadi