Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate using dplyr, percentage of NA'S in each column

Tags:

r

dplyr

I have a data frame with some columns with missing values. Is there a way (using dplyr) to efficiently calculate the percentage of each column that is missing i.e. NA. Sought of like a colSum equivalent. So I dont have to calculate each column percentage missing individually ?

like image 760
MP61 Avatar asked Nov 04 '15 02:11

MP61


People also ask

How do I get the percentage of missing values in each column in R?

To find the percentage of missing values in each column of an R data frame, we can use colMeans function with is.na function. This will find the mean of missing values in each column. After that we can multiply the output with 100 to get the percentage.

How do I get the percentage of a column in R?

To calculate percent, we need to divide the counts by the count sums for each sample, and then multiply by 100. This can also be done using the function decostand from the vegan package with method = "total" .

How do I count the number of NAS in a row in R?

You can use the is.na() function for this purpose. You can use the rowSums() function to do this. As the name suggests, this function sums the values of all elements in a row. Since TRUEs are equal to 1 and FALSEs are equal to 0, summing the number of TRUEs is the same as counting the number of NA's.

How do you sum missing values in R?

R automatically converts logical vectors to integer vectors when using arithmetic functions. In the process TRUE gets turned to 1 and FALSE gets converted to 0 . Thus, sum(is.na(x)) gives you the total number of missing values in x .


2 Answers

First, I created a test data for you:

a<- c(1,NA,NA,4)
b<- c(NA,2,3,4)
x<- data.frame(a,b)
x
#    a  b
# 1  1 NA
# 2 NA  2
# 3 NA  3
# 4  4  4

Then you can use colMeans(is.na(x)) :

colMeans(is.na(x))
#    a    b 
# 0.50 0.25 
like image 66
Gavin Avatar answered Oct 17 '22 07:10

Gavin


We can use summarise_each

 library(dplyr)
 x %>% 
   summarise_each(funs(100*mean(is.na(.))))
like image 16
akrun Avatar answered Oct 17 '22 05:10

akrun