Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determine the number of NA values in a column

Tags:

dataframe

r

People also ask

How do I count the number of NA in a column in R?

Counting NA s across either rows or columns can be achieved by using the apply() function. This function takes three arguments: X is the input matrix, MARGIN is an integer, and FUN is the function to apply to each row or column. MARGIN = 1 means to apply the function across rows and MARGIN = 2 across columns.

How do I count the number of NA in a row in R?

Count the Number of NA's per Row with rowSums() The first method to find the number of NA's per row in R uses the power of the functions is.na() and rowSums(). Both the is.na() function and the rowSums() function are R base functions. Therefore, it is not necessary to install additional packages.

How do you find the number of missing values in a data frame?

DataFrame , sum() of numpy. ndarray calculates the sum of all elements by default. Therefore, by calling sum() from the values attribute ( numpy. ndarray ) of the result of isnull() , you can get the total number of missing values.


You're over-thinking the problem:

sum(is.na(df$col))

If you are looking for NA counts for each column in a dataframe then:

na_count <-sapply(x, function(y) sum(length(which(is.na(y)))))

should give you a list with the counts for each column.

na_count <- data.frame(na_count)

Should output the data nicely in a dataframe like:

----------------------
| row.names | na_count
------------------------
| column_1  | count

Try the colSums function

df <- data.frame(x = c(1,2,NA), y = rep(NA, 3))

colSums(is.na(df))

#x y 
#1 3 

If you are looking to count the number of NAs in the entire dataframe you could also use

sum(is.na(df))

A quick and easy Tidyverse solution to get a NA count for all columns is to use summarise_all() which I think makes a much easier to read solution than using purrr or sapply

library(tidyverse)
# Example data
df <- tibble(col1 = c(1, 2, 3, NA), 
             col2 = c(NA, NA, "a", "b"))

df %>% summarise_all(~ sum(is.na(.)))
#> # A tibble: 1 x 2
#>    col1  col2
#>   <int> <int>
#> 1     1     2

Or using the more modern across() function:

df %>% summarise(across(everything(), ~ sum(is.na(.))))

In the summary() output, the function also counts the NAs so one can use this function if one wants the sum of NAs in several variables.