Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read_excel correctly imports file, but "invalid multibyte string" error when trying to put it on a list

Tags:

r

readxl

When I read any of the sheets from the file Posti-Letto-Istat.xls with read_excel from the readxl package I have no problems:

library(readxl)
pl_istat1 <- read_excel(path = "data/Posti-Letto-Istat.xls", sheet = 1, range = "A6:I66", na = "....")

However if I try to use a lapply or a for cycle to have all three sheets in a list, I get the following error.

lapply(1:3, function(i) read_excel(path = "data/Posti-Letto-Istat.xls", sheet = i, range = "A6:I66", na = "....")) 

Error in nchar(x, type = "width") : invalid multibyte string, element 4

I see that it's an encoding issue, and if I do something like

names(pl_istat[[i]]) <- iconv(enc2utf8(names(pl_istat[[i]])),sub="byte")

to each sheet, then I got no issue.

However, is there a way to have the list accept the tibble that gets correctly imported by readxl?

My Session Info:

R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          

[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readxl_1.0.0

loaded via a namespace (and not attached):
[1] magrittr_1.5     assertthat_0.2.0 R6_2.2.2         tools_3.3.3      bindrcpp_0.2     glue_1.1.1       dplyr_0.7.3      tibble_1.3.4     Rcpp_0.12.12    

[10] cellranger_1.1.0 rematch_1.0.1    pkgconfig_2.0.1  rlang_0.1.2      bindr_0.1     
like image 607
Max Avatar asked Sep 30 '17 13:09

Max


2 Answers

I had the same error and could solve it by wrapping read_excel() with as.data.frame()

lapply(
  1:3, 
  function(i) {
    as.data.frame(
      read_excel(path = "data/Posti-Letto-Istat.xls", sheet = i, range = "A6:I66", na = "....")
    )
  }
) 
like image 107
sindri_baldur Avatar answered Nov 19 '22 11:11

sindri_baldur


I had a similar problem when trying to save tibbles created using readxl into a list. Since I had multiple header rows I read only the headers first, concatenated them and created a vector named headers with the column names. Then read the actual data with read_excel and with the parameter col_names = FALSE. I had no problem saving these "nameless" tibbles into a list, but if I renamed the columns using headers I'd get this error:

Error in nchar(x[is_na], type = "width") : 
  invalid multibyte string, element 1

I solved it by issue by changing the encoding before renaming the tibble:

headers <- enc2native(headers)

But after this, when the list is printed I get this warning:

In fansi::strwrap_ctl(x, width = max(width, 0), indent = indent,  :
  Encountered a C0 control character, see `?unhandled_ctl`; you can use `warn=FALSE` to turn off these warnings.

which appears to be caused by a bug in base R according to 1 and 2, and it hasn't been an issue for me.

like image 1
DaríoM Avatar answered Nov 19 '22 11:11

DaríoM