This is the error that I receive when I try to run <code>tolower()</code> on a character vector from a file that cannot be changed (at least, not manually - too large). <code>Error in tolower(m) : invalid multibyte string X</code> It seems to be French company names that are the problem with the <code>É</code> character. Although I have not investigated all of them (also not possible to do so manually). It's strange, because my thought was that encoding issues would have been identified during <code>read.csv()</code>, rather than during operations after the fact. Is there a quick way to remove these multibyte strings? Or, perhaps a way to identify and convert? Or even just ignore them entirely?

Here's how I solved my problem: First, I opened the raw data in a texteditor (Geany, in this case), clicked properties and identified the Encoding type. After which I used the <code>iconv()</code> function. <pre class="prettyprint"><code>x <- iconv(x,"WINDOWS-1252","UTF-8") </code></pre> To be more specific, I did this for every column of the <code>data.frame</code> from the imported CSV. Important to note that I set <code>stringsAsFactors=FALSE</code> in my <code>read.csv()</code> call. <pre class="prettyprint"><code>dat[,sapply(dat,is.character)] <- sapply( dat[,sapply(dat,is.character)], iconv,"WINDOWS-1252","UTF-8") </code></pre>

Error in tolower() invalid multibyte string

2 Answers

Here's how I solved my problem:

First, I opened the raw data in a texteditor (Geany, in this case), clicked properties and identified the Encoding type.

After which I used the iconv() function.

x <- iconv(x,"WINDOWS-1252","UTF-8")

To be more specific, I did this for every column of the data.frame from the imported CSV. Important to note that I set stringsAsFactors=FALSE in my read.csv() call.

dat[,sapply(dat,is.character)] <- sapply(
    dat[,sapply(dat,is.character)],
    iconv,"WINDOWS-1252","UTF-8")

159

answered Sep 18 '22 15:09

Brandon Bertelsen

I was getting the same error. However, in my case it wasn't when I was reading the file, but a bit later when processing it. I realised that I was getting the error, because the file wasn't read with the correct encoding in the first place.

I found a much simpler solution (at least for my case) and wanted to share. I simply added encoding as below and it worked.

read.csv(<path>, encoding = "UTF-8")

answered Sep 18 '22 15:09

Onur Ece

Related questions
                            
                                counting occurrences in data.frame in r
                            
                                Convert time from numeric to time format in R
                            
                                Constructing a named list without having to type each object's name twice [duplicate]
                            
                                How can I calculate the percentage change within a group for multiple columns in R?
                            
                                Removing elements from pandas series in python
                            
                                How to cite multiple papers in RMarkdown
                            
                                Rmarkdown setting the position of kable
                            
                                Reshaping data frame in R [duplicate]
                            
                                How to create a factor from a binary indicator matrix?
                            
                                Merge dataframes, different lengths
                            
                                ggplot2 time series plotting: how to omit periods when there is no data points?
                            
                                dplyr::do() requires named function?
                            
                                Grouping every n minutes with dplyr
                            
                                How to embed an image in a cell a table using DT, R and Shiny
                            
                                Generating Multiple Plots in ggplot by Factor
                            
                                Replace NA on numeric columns with mutate_if and replace_na
                            
                                How to replace NA with set of values
                            
                                Import multiline SQL query to single string
                            
                                barplot using ggplot2
                            
                                ggplot2: Drop unused factors in a faceted bar plot but not have differing bar widths between facets

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Error in tolower() invalid multibyte string

Tags:

r

Brandon Bertelsen

People also ask

2 Answers

Brandon Bertelsen

Onur Ece

Recent Activity

Donate For Us