What is the optimal way to to remove German (or French) accents from a vector of 16 million string variables. e.g., 'Sjögren's syndrome' into 'Sjogren's syndrome' Converstion of single character into a single character is better then transliteration such as ä => ae ö => oe ü => ue. e.g., using regular expression would be one option but is there something better (R package for this)? <code>gsub('ü','u',gsub('ö','o',"Sjögren's syndrome ( über) "))</code> There are SO solutions for non-R platforms but not a good one for R.

Use <code>iconv</code> to convert to ASCII with transliteration (if supported): <pre class="prettyprint"><code>iconv(c("über","Sjögren's"),to="ASCII//TRANSLIT") [1] "uber" "Sjogren's" </code></pre>

One of the linked answers suggest <pre class="prettyprint"><code>library(stringi) stri_trans_general("Zażółć gęślą jaźń", "Latin-ASCII") [1] "Zazolc gesla jazn" </code></pre>

Convert accented characters into ascii character

2 Answers

Use iconv to convert to ASCII with transliteration (if supported):

iconv(c("über","Sjögren's"),to="ASCII//TRANSLIT") [1] "uber"      "Sjogren's"

answered Oct 25 '22 14:10

James

One of the linked answers suggest

library(stringi) stri_trans_general("Zażółć gęślą jaźń", "Latin-ASCII")  [1] "Zazolc gesla jazn"

answered Oct 25 '22 14:10

userJT

Related questions
                            
                                How to post multipart/form-data with node.js superagent
                            
                                Mogenerator error: skipping entity MyObjectName (NSManagedObject) because it doesn't use a custom subclass.
                            
                                Is there any Restkit 2.0 Tutorial like raywenderlich? [closed]
                            
                                What is the limit of SQL variables one can specify in a single execSQL query
                            
                                How to split a number into digits in R
                            
                                SignalR support in .NET 4
                            
                                Select top 1 result from subquery in linq to sql
                            
                                dump json into yaml
                            
                                splitting a string into an array in C++ without using vector
                            
                                How to make notification resume and not recreate activity?
                            
                                How to get response SSL certificate from requests in python?
                            
                                How to sort a character vector according to a specific order?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert accented characters into ascii character

Tags:

userJT

People also ask

2 Answers

James

userJT

Recent Activity

Donate For Us