I'm trying to wrap some dplyr magic inside a function to produce a data.frame that I then print with xtable.
The ultimate aim is to have a dplyr version of this working, and reading around I came across the very useful summarise_each()
function which after subsetting with regroup()
(since this is within a function) I can then use to get all columns parsed.
The problem I've encountered (so far) is with calling is.na()
from within summarise_each(funs(is.na))
as I'm told Error: expecting a single value
.
I'm purposefully not posting my function just yet but a minimal example follows (NB - This uses group_by()
whilst in my function I replace this with regroup()
)...
library(dplyr)
library(magrittr)
> t <- data.frame(grp = rbinom(10, 1, 0.5),
a = as.factor(round(rnorm(10))),
b = rnorm(10),
c = rnorm(10))
t %>%
group_by(grp) %>% ## This is replaced with regroup() in my function
summarise_each(funs(is.na))
Error: expecting a single value
Running this fails, and its the call to is.na()
that is the problem since if I instead work out the number of observations in each (required to derive the proportion of missing) it works...
> t %>%
group_by(grp) %>% ## This is replaced with regroup() in my function
summarise_each(funs(length))
Source: local data frame [2 x 4]
grp a b c
1 0 8 8 8
2 1 2 2 2
The real problem though is that I do not need just is.na()
within each column, but the sum(is.na())
as per the linked example so what I really would like is...
> t %>%
group_by(grp) %>% ## This is replaced with regroup() in my function
summarise_each(funs(propmiss = sum(is.na) / length))
But the problem is that sum(is.na)
doesn't work as I expect it to (likely because my expectation is wrong!)...
> t %>%
group_by(grp) %>% ## This is replaced with regroup() in my function
summarise_each(funs(nmiss = sum(is.na)))
Error in sum(.Primitive("is.na")) : invalid 'type' (builtin) of argument
I tried calling is.na()
explicitly with the brackets but that too returns an error...
> t %>%
+ group_by(grp) %>% ## This is replaced with regroup() in my function
+ summarise_each(funs(nmiss = sum(is.na())))
Error in is.na() : 0 arguments passed to 'is.na' which requires 1
Any advice or pointers to documentation would be very gratefully received.
Thanks,
slackline
Here's a possibility, tested on a small data set with some NA
:
df <- data.frame(a = rep(1:2, each = 3),
b = c(1, 1, NA, 1, NA, NA),
c = c(1, 1, 1, NA, NA, NA))
df
# a b c
# 1 1 1 1
# 2 1 1 1
# 3 1 NA 1
# 4 2 1 NA
# 5 2 NA NA
# 6 2 NA NA
df %>%
group_by(a) %>%
summarise_each(funs(sum(is.na(.)) / length(.)))
# a b c
# 1 1 0.3333333 0
# 2 2 0.6666667 1
And because you asked for pointers to documentation: The .
refers to each piece of the data, and is used in some Examples in ?summarize_each
. It is described in the Arguments section of ?funs
as a "dummy parameter" , and is used the Examples. The .
is also briefly described in the Arguments section of ?do
: "...
You can use .
to refer to the current group"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With