Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determine the number of NA values in a column

Tags:

dataframe

r

People also ask

How do I count the number of NA in a column in R?

Counting NA s across either rows or columns can be achieved by using the apply() function. This function takes three arguments: X is the input matrix, MARGIN is an integer, and FUN is the function to apply to each row or column. MARGIN = 1 means to apply the function across rows and MARGIN = 2 across columns.

How do I count the number of NA in a row in R?

Count the Number of NA's per Row with rowSums() The first method to find the number of NA's per row in R uses the power of the functions is.na() and rowSums(). Both the is.na() function and the rowSums() function are R base functions. Therefore, it is not necessary to install additional packages.

How do you find the number of missing values in a data frame?

DataFrame , sum() of numpy. ndarray calculates the sum of all elements by default. Therefore, by calling sum() from the values attribute ( numpy. ndarray ) of the result of isnull() , you can get the total number of missing values.


You're over-thinking the problem:

sum(is.na(df$col))

If you are looking for NA counts for each column in a dataframe then:

na_count <-sapply(x, function(y) sum(length(which(is.na(y)))))

should give you a list with the counts for each column.

na_count <- data.frame(na_count)

Should output the data nicely in a dataframe like:

----------------------
| row.names | na_count
------------------------
| column_1  | count

Try the colSums function

df <- data.frame(x = c(1,2,NA), y = rep(NA, 3))

colSums(is.na(df))

#x y 
#1 3 

If you are looking to count the number of NAs in the entire dataframe you could also use

sum(is.na(df))

A quick and easy Tidyverse solution to get a NA count for all columns is to use summarise_all() which I think makes a much easier to read solution than using purrr or sapply

library(tidyverse)
# Example data
df <- tibble(col1 = c(1, 2, 3, NA), 
             col2 = c(NA, NA, "a", "b"))

df %>% summarise_all(~ sum(is.na(.)))
#> # A tibble: 1 x 2
#>    col1  col2
#>   <int> <int>
#> 1     1     2

Or using the more modern across() function:

df %>% summarise(across(everything(), ~ sum(is.na(.))))

In the summary() output, the function also counts the NAs so one can use this function if one wants the sum of NAs in several variables.


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!