Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error: Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'character' in dplyr group_by

Tags:

r

dplyr

I have a super random error when using group_by in dplyr as follows

dat %>% group_by(variable) %>% mutate(score = score[1])

where dat is a data.frame with factor/character column variable and score is a double. The error I get is this:

Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'character'
  • It happens regardless of the name of variable.
  • Regardless of the computation (e.g. score[1]).
  • I can group by any other variable.
  • Both variables seem fine. I can marginally do all sorts of computations with variable or score. Variable has only repeated entries "arima" and "prophet", no NAs.
  • Also, the error message shows up only sometimes. Most often the R session simply terminates due to a fatal error.

It's driving me crazy...

I googled the error and couldn't find good help, except for narrowing it down, which is what I did, i.e. a simple group_by computation. Further, I restarted the R session, restarted the computer, updated my R version and the dplyr package, which is now on version 1.07. I use R version 4.1.1 on Ubuntu 20.04.3 LTS.

Any thoughts on what could possibly produce this error?

Edit I cannot provide a reproducible sample with simulated data as it only happens with the particular data. Here I uploaded the data that is causing the error https://filebin.net/9pywc544hsmgm2p3

Then run the following code

A <- readRDS("dat.rds") %>% 
    group_by(variable) %>% 
    mutate(score = score[1])

Interestingly, if you group by A[1:nrow(A), ] instead of A it works, although it is the same data.

Edit 2: I could now run the computation a couple of times, but at some point I always get a fatal error for the same computation. I get the feeling that this seems specific to my system. So probably I have to reinstall everything.

Edit 3: Doing as.numeric(score) solves the issue. So it seems something was up with the name attributes of score. However, they do not look suspicious and all have the same name, which is "new_confirmed10".

like image 801
Nic Avatar asked Oct 29 '21 10:10

Nic


2 Answers

That's an internal error in C code that implements an R function. It should never occur in user code. This is definitely a bug somewhere. You can probably narrow down the location slightly by running traceback() immediately after seeing it. That will list all the R functions that are active at the time the error happened, and might give the location of the R function that called the buggy C code.

Please put together a reproducible example (i.e. give us a way to construct dat so that we can reproduce it on our systems). Someone here on SO will be able to locate the cause of the error, and maybe be able to tell you a workaround, or tell the author of the code how to fix it.

If you can't do that, it's unlikely that anyone will be able to help you, but here's some general advice:

  • Don't use packages or package versions that are only on Github, stick to CRAN packages which have generally received better testing.
  • If you are developing your own C/C++ code and it is somehow involved here, you need to use gdb or lldb to do C-level debugging. If you haven't used those before, it's not easy.

EDITED to add:

After examining the uploaded version of the dat.rds file, I think I've discovered part of the problem. Running this code:

dat <- readRDS("dat.rds")
table(names(dat$score))

I can see that the names of dat$score are 744 copies of "new_confirmed10". Having repeated names is unusual, but legal. However, the length of dat$score is 1488 (the same as the number of rows of dat), and as far as I know it's never legal to have a different number of names than the length of the object. (If you only assign names to some elements, you should get name NA on the others.)

When you run your code, something in the mutate call crashes, because it assumes the object is well-formed, and it's not. So this isn't a bug in dplyr, but I think it's a bug in whatever code created the object in dat.rds. Do you have any record of how it was created?

like image 119
user2554330 Avatar answered Oct 21 '22 13:10

user2554330


I kind of solved it. score was a Named num. After doing as.numeric(score) it worked. Still don't why the name attributes caused this error though (they seem all normal), but so be it. I realized this once I exported data as .csv and .rds and the error occured only when I reloaded the .rds data, so I figured it must be related with some data attributes.

like image 1
Nic Avatar answered Oct 21 '22 15:10

Nic