I have text in German and I want to replace all umlauts (ä, Ä, ü, Ü, ö, Ö) with ae, oe, ue, etc.
I can do it separately (by saving each substitution into a new file):
gsub(pattern = '[ä]', replacement = "ae",text)
gsub(pattern = '[ü]', replacement = "ue",text)
gsub(pattern = '[ö]', replacement = "oe",text)
But can I do it in one command (including substituting capital letters with Ae, Oe and Ue, etc.)?
Can I do it by regex?
Some solutions here might or might not work depending on the locale of the OS running R and the encoding of the input string. I had this problem many times on different OS and different language settings. Currently I am developing R using German Windows 10 but sometimes run the code on an English Ubuntu VM.
A very fast and reliable solution under both Windows and Ubuntu, both de_DE and en_US is this solution: https://github.com/gagolews/stringi/issues/269#issuecomment-488623874
> stringi::stri_trans_general("ä ö ü ß", "de-ASCII; Latin-ASCII")
[1] "ae oe ue ss"
The ;
inside of the ICU transform id makes this a 'compound id'. See ?stri_trans_general
for more info.
You could try
# install.packages("stringi) # uncomment & run if needed
str <- c("äöü", "ÄÖÜ")
stringi::stri_replace_all_fixed(
str,
c("ä", "ö", "ü", "Ä", "Ö", "Ü"),
c("ae", "oe", "ue", "Ae", "Oe", "Ue"),
vectorize_all = FALSE
)
# [1] "aeoeue" "AeOeUe"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With