Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

handling special characters e.g. accents in R

Tags:

r

character

I am doing some web scraping of names into a dataframe

For a name such as "Tomáš Rosický, I get a result "Tomáš Rosický"

I tried

Encoding("Tomáš Rosický") #  with latin1 response

but was not sure where to go from there to get the original name with accents back. Played around with iconv without success

I would be satisfied (and might even prefer) an output of "Tomas Rosicky"

like image 967
pssguy Avatar asked Mar 01 '12 05:03

pssguy


People also ask

How do you handle special characters in R?

To be able to use special characters within a function such as gsub, we have to add two backslashes (i.e. \\) in front of the special character. …the next R syntax replaces the question mark… Looks good! We can use the previous type of R code for basically any special character.

How do you use accents with characters?

To add accent marks to letters in foreign words, Microsoft Word users can utilize the following keyboard shortcuts to add the accent marks. For example, to get the character ñ, press the Ctrl and Shift Keys while pressing the ~ key (tilde key). Then, release all three keys and quickly press the n key.

How do I make an O with an accent?

For example, to type a ô, hold down CTRL, SHIFT and ^, release and type o.


2 Answers

You've read in a page encoded in UTF-8. if x is your column of names, use Encoding(x) <- "UTF-8".

like image 194
Hong Ooi Avatar answered Sep 17 '22 18:09

Hong Ooi


You should use this:

df$colname <- iconv(df$colname, from="UTF-8", to="LATIN1")
like image 21
Roadkill Avatar answered Sep 20 '22 18:09

Roadkill