I've made a custom sum function that ignores NA
s unless all are NA
. When I use it in dplyr
it returns odd results and I don't know why.
require(dplyr)
dta <- data.frame(year=2007:2013, rrconf=c(79, NaN ,474,2792,1686,3313,3456), enrolled=c(NaN,NaN,458,1222,1155,1906,2184))
sum0 <- function(x, ...){
# remove NAs unless all are NA
if(is.na(mean(x, na.rm=TRUE))) return(NA)
else(sum(x, ..., na.rm=TRUE))
}
dta %>%
group_by(year) %>%
summarize(rrconf=sum0(rrconf), enrolled=sum0(enrolled))
gives me
Source: local data frame [7 x 3]
year rrconf enrolled
1 2007 79 NA
2 2008 NA NA
3 2009 474 TRUE
4 2010 2792 TRUE
5 2011 1686 TRUE
6 2012 3313 TRUE
7 2013 3456 TRUE
In this case it is only summing over one value, but in my bigger application in might summer over multiple values. Wrapping my sum0
function in as.integer()
seems to fix it, but I couldn't tell you why.
Is this the correct way to work around this problem? Is there something obvious I'm missing?
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.2
loaded via a namespace (and not attached):
[1] assertthat_0.1 magrittr_1.0.1 parallel_3.1.0 Rcpp_0.11.2 tools_3.1.0
The issue seems to be with dplyr
determining the column type in reference to the first returned result. If you force the NA
value, which is by default a logical value, to be an NA_real_
or NA_integer_
, then you will be sorted:
##Just to show what NA normally does first:
class(NA)
#[1] "logical"
sum0 <- function(x, ...){
# remove NAs unless all are NA
if(is.na(mean(x, na.rm=TRUE))) return(NA_real_)
else(sum(x, ..., na.rm=TRUE))
}
dta %>%
group_by(year) %>%
summarize(rrconf=sum0(rrconf), enrolled=sum0(enrolled))
#Source: local data frame [7 x 3]
#
# year rrconf enrolled
#1 2007 79 NA
#2 2008 NA NA
#3 2009 474 458
#4 2010 2792 1222
#5 2011 1686 1155
#6 2012 3313 1906
#7 2013 3456 2184
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With