I have a super random error when using group_by
in dplyr
as follows
dat %>% group_by(variable) %>% mutate(score = score[1])
where dat
is a data.frame
with factor/character column variable
and score
is a double. The error I get is this:
Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'character'
variable
.score[1]
).variable
or score
. Variable has only repeated entries "arima" and "prophet", no NA
s.It's driving me crazy...
I googled the error and couldn't find good help, except for narrowing it down, which is what I did, i.e. a simple group_by
computation. Further, I restarted the R session, restarted the computer, updated my R version and the dplyr
package, which is now on version 1.07.
I use R version 4.1.1 on Ubuntu 20.04.3 LTS.
Any thoughts on what could possibly produce this error?
Edit I cannot provide a reproducible sample with simulated data as it only happens with the particular data. Here I uploaded the data that is causing the error https://filebin.net/9pywc544hsmgm2p3
Then run the following code
A <- readRDS("dat.rds") %>%
group_by(variable) %>%
mutate(score = score[1])
Interestingly, if you group by A[1:nrow(A), ]
instead of A
it works, although it is the same data.
Edit 2: I could now run the computation a couple of times, but at some point I always get a fatal error for the same computation. I get the feeling that this seems specific to my system. So probably I have to reinstall everything.
Edit 3:
Doing as.numeric(score)
solves the issue. So it seems something was up with the name attributes of score. However, they do not look suspicious and all have the same name, which is "new_confirmed10".
That's an internal error in C code that implements an R function. It should never occur in user code. This is definitely a bug somewhere. You can probably narrow down the location slightly by running traceback()
immediately after seeing it. That will list all the R functions that are active at the time the error happened, and might give the location of the R function that called the buggy C code.
Please put together a reproducible example (i.e. give us a way to construct dat
so that we can reproduce it on our systems). Someone here on SO will be able to locate the cause of the error, and maybe be able to tell you a workaround, or tell the author of the code how to fix it.
If you can't do that, it's unlikely that anyone will be able to help you, but here's some general advice:
gdb
or lldb
to do C-level debugging. If you haven't used those before, it's not easy.EDITED to add:
After examining the uploaded version of the dat.rds
file, I think I've discovered part of the problem. Running this code:
dat <- readRDS("dat.rds")
table(names(dat$score))
I can see that the names of dat$score
are 744 copies of "new_confirmed10"
. Having repeated names is unusual, but legal. However, the length of dat$score
is 1488 (the same as the number of rows of dat
), and as far as I know it's never legal to have a different number of names than the length of the object. (If you only assign names to some elements, you should get name NA
on the others.)
When you run your code, something in the mutate call crashes, because it assumes the object is well-formed, and it's not. So this isn't a bug in dplyr
, but I think it's a bug in whatever code created the object in dat.rds
. Do you have any record of how it was created?
I kind of solved it. score
was a Named num
. After doing as.numeric(score)
it worked. Still don't why the name attributes caused this error though (they seem all normal), but so be it. I realized this once I exported data as .csv
and .rds
and the error occured only when I reloaded the .rds
data, so I figured it must be related with some data attributes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With