I'm trying to wrap some dplyr magic inside a function to produce a data.frame that I then print with xtable.
The ultimate aim is to have a dplyr version of this working, and reading around I came across the very useful summarise_each() function which after subsetting with regroup() (since this is within a function) I can then use to get all columns parsed.
The problem I've encountered (so far) is with calling is.na() from within summarise_each(funs(is.na)) as I'm told Error: expecting a single value.
I'm purposefully not posting my function just yet but a minimal example follows (NB - This uses group_by() whilst in my function I replace this with regroup())...
library(dplyr)
library(magrittr)
> t <- data.frame(grp = rbinom(10, 1, 0.5),
                a = as.factor(round(rnorm(10))),
                b = rnorm(10),
                c = rnorm(10))
t %>%
group_by(grp) %>%  ## This is replaced with regroup() in my function
summarise_each(funs(is.na))
Error: expecting a single value
Running this fails, and its the call to is.na() that is the problem since if I instead work out the number of observations in each (required to derive the proportion of missing) it works...
> t %>%
group_by(grp) %>%  ## This is replaced with regroup() in my function
summarise_each(funs(length))
Source: local data frame [2 x 4]
  grp a b c
1   0 8 8 8
2   1 2 2 2
The real problem though is that I do not need just is.na() within each column, but the sum(is.na()) as per the linked example so what I really would like is...
> t %>%
group_by(grp) %>%  ## This is replaced with regroup() in my function
summarise_each(funs(propmiss = sum(is.na) / length))
But the problem is that sum(is.na) doesn't work as I expect it to (likely because my expectation is wrong!)...
> t %>%
group_by(grp) %>%  ## This is replaced with regroup() in my function
summarise_each(funs(nmiss = sum(is.na)))
Error in sum(.Primitive("is.na")) : invalid 'type' (builtin) of argument
I tried calling is.na() explicitly with the brackets but that too returns an error...
> t %>%
+ group_by(grp) %>%  ## This is replaced with regroup() in my function
+ summarise_each(funs(nmiss      = sum(is.na())))
Error in is.na() : 0 arguments passed to 'is.na' which requires 1
Any advice or pointers to documentation would be very gratefully received.
Thanks,
slackline
Here's a possibility, tested on a small data set with some NA:
df <- data.frame(a = rep(1:2, each = 3),
                 b = c(1, 1, NA, 1, NA, NA),
                 c = c(1, 1, 1, NA, NA, NA))
df
#   a  b  c
# 1 1  1  1
# 2 1  1  1
# 3 1 NA  1
# 4 2  1 NA
# 5 2 NA NA
# 6 2 NA NA
df %>% 
  group_by(a) %>%
  summarise_each(funs(sum(is.na(.)) / length(.)))
#   a         b c
# 1 1 0.3333333 0
# 2 2 0.6666667 1
And because you asked for pointers to documentation: The . refers to each piece of the data, and is used in some Examples in ?summarize_each. It is described in the Arguments section of ?funs as a "dummy parameter" , and is used the Examples. The . is also briefly described in the Arguments section of ?do: "... You can use . to refer to the current group"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With