The function mutate
from the R package 'dplyr' has a peculiar recycling feature for factors, in that it seems to return the factor as.numeric
. In the following example y
becomes what you would expect, whereas z
is c(1,1)
library(dplyr)
df <- data_frame(x=1:2)
glimpse(df %>% mutate(y="A", z=factor("B")))
# Variables:
# $ x (int) 1, 2
# $ y (chr) "A", "A"
# $ z (int) 1, 1
Is there any rationale behind this, or is it a bug?
(I am using R 3.1.1 and dplyr 0.3.0.1.)
EDIT:
After posting this as an issue on github, Romain Francois fixed it within hours! So if the above is a problem use devtools::install_github
to get the latest version:
library(devtools)
install_github("hadley/dplyr")
and then
library(dplyr)
df <- data_frame(x=1:2)
glimpse(df %>% mutate(y="A", z=factor("B")))
# Variables:
# $ x (int) 1, 2
# $ y (chr) "A", "A"
# $ z (fctr) B, B
Nice work Romain!
dplyr uses C++ to perform the actual mutate
operation. Following the rabbit hole and noting this is an ungrouped mutation, we can use our trusty debugger to notice the following.
debugonce(dplyr:::mutate_impl)
# Inside of mutate_impl we do:
class(dots[[2]]$expr) # which is a "call"!
So now we know the type of our lazy expression. We eval the call and notice it is a supported type (unfortunately, R's TYPEOF
macro claims factors are integers - we would need Rf_isFactor
to discriminate).
So what happens next? We returned the result and we're done. If you have tried (df %>% mutate(y="A", z=factor(c("A","B"))))[[3]]
already, you'll know that the issue is indeed the recycling.
Specifically, the C++ Gatherer object (which should really be checking for Rf_isFactor
in addition to its current date check on INTSXP
s) is using C++ templating to force a Vector<INTSXP>
to be created (implicitly through constructor initialization - notice the arity 2 call in ConstantGathererImpl
) without remembering to carry over the factor "label."
TLDR: In R's C++, integers and factors have the same internal type when using the TYPEOF
macro, and factors are a weird edge case.
Feel free to submit a pull request to dplyr, it's in active development and hadley and Romain are nice guys. You'll have to add an if statement here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With