I use the write.xlsx()
function (from the openxlsx
package) to turn a list object into an excel spreadsheet, where each element of the list is converted into a "sheet" of the excel file. In the past, this function has been incredibly useful, and I have never encountered any issues. It is my understanding that this package and function in particular does not need any particular java update on the computer in order for it to work.
However, recently I discovered that the function is producing error. This is what it states in the console when I run the write.xlsx() for the list:
Error in gsub("&", "&", v, fixed = TRUE) :
input string 5107 is invalid UTF-8
I've identified the dataframes that are the cause of the issue, but I am not sure how to identify which part of the data frame is causing the error.
I've even went ahead and used the enc2utf8()
function for all of the columns in this data frame in particular but I still encounter the error. I've used the substr()
function on the data frame itself, and it shows me the first n
characters of each column, though I do not see any obvious issues from the output.
I've even went ahead and used the install.packages()
function to re-download the openxlsx
package again, in case of any updates.
Does anyone know how I would go about identifying the cause of the error? Is it the function as it is written in the package? If the problem is in the encoding of the data itself, does the enc2utf8()
not suffice to resolve the issue?
Thanks!
I just had this same problem. Building on this question, you could replace all bad characters in the dataframe with either:
library(dplyr)
df %>%
mutate_if(is.character, ~gsub('[^ -~]', '', .))
for only character columns, or:
df %>%
mutate_all(~gsub('[^ -~]', '', .))
for all columns, and then export to XLSX with write.xlsx()
.
As far as finding the error, the number given points you to the problem (in your case, 5107). This appears to be counting the strings that are written to the file. To find the particular data point that's the issue, this approach worked for me:
Let's assume our data frame has 20 variables and 10 of them are character type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With