Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r - using summarise_each() to count records ignoring NAs

Tags:

r

dplyr

Is there a way to use summarise_each() to count the number of records in a data frame, but ignore NAs?

Example / Sample Data

df_sample <- structure(list(var_1 = c(NA, NA, NA, NA, 1, NA), var_2 = c(NA, 
  NA, NA, NA, 2, 1), var_3 = c(NA, NA, NA, NA, 3, 2), var_4 = c(NA_real_, 
  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), var_5 = c(NA, 
  NA, NA, NA, 4, 3)), .Names = c("var_1", "var_2", "var_3", "var_4", 
  "var_5"), row.names = 5:10, class = "data.frame")

> df_samp
   var_1 var_2 var_3 var_4 var_5
5     NA    NA    NA    NA    NA
6     NA    NA    NA    NA    NA
7     NA    NA    NA    NA    NA
8     NA    NA    NA    NA    NA
9      1     2     3    NA     4
10    NA     1     2    NA     3

Using summarise_each() and n() counts all the records:

library(dplyr)
df_samp %>%
  summarise_each(funs(n()))

## result:
   var_1 var_2 var_3 var_4 var_5
1     6     6     6     6     6

I know that n() doesn't accept arguments, therefore is there another method I can use within summarise_each() that will ignore the NAs when counting the number of records, and will return zero if the variable is all NA?

Desired Result

   var_1 var_2 var_3 var_4 var_5
1     1     2     2     0     2

The following method gets me part of the way there, but I would also like to return a 0 for var_4:

df_samp %>%
  melt %>%
  filter(!is.na(value)) %>%
  group_by(variable) %>%
  summarise(records = n())

## result:
  variable records
1    var_1       1
2    var_2       2
3    var_3       2
4    var_5       2
like image 359
tospig Avatar asked Jun 27 '15 09:06

tospig


2 Answers

Try:

df_sample %>% summarise_all(funs(sum(!is.na(.))))

Which gives:

#  var_1 var_2 var_3 var_4 var_5
#1     1     2     2     0     2
like image 140
Steven Beaupré Avatar answered Nov 19 '22 21:11

Steven Beaupré


Using data.table

 library(data.table)
 setDT(df_sample)[, lapply(.SD, function(x) sum(!is.na(x)))]
 #   var_1 var_2 var_3 var_4 var_5
 #1:     1     2     2     0     2

Or with base R

 vapply(df_sample, function(x) sum(!is.na(x)), numeric(1))
 #var_1 var_2 var_3 var_4 var_5 
 #  1     2     2     0     2 
like image 24
akrun Avatar answered Nov 19 '22 23:11

akrun