I'm currently running the following code to clean my data from accent characters:
df <- gsub('Á|Ã', 'A', df)
df <- gsub('É|Ê', 'E', df)
df <- gsub('Í', 'I', df)
df <- gsub('Ó|Õ', 'O', df)
df <- gsub('Ú', 'U', df)
df <- gsub('Ç', 'C', df)
However, I would like to do it in just one line (using another function for it would be ok). How can I do this?
Try something like this
iconv(c('Á'), "utf8", "ASCII//TRANSLIT")
You can just add more elements to the c().
EDIT: it is machine dependent, check help(iconv)
Here is the R solution
mychar <- c('ÁÃÉÊÍÓÕÚÇ')
iconv(mychar, "latin1", "ASCII//TRANSLIT") # one line, as requested
[1] "AAEEIOOUC"
It an encoding problem, Normally you resolve it by indicating the right encoding. If you still want to use regular expression to do it , you can use gsubfn to write one liner solution:
library(gsubfn)
ll <- list('Á'='A', 'Ã'='A', 'É'='E',
'Ê'='E', 'Í'='I', 'Ó'='O',
'Õ'='O', 'Ú'='U', 'Ç'='C')
gsubfn('Á|Ã|É|Ê|Í|Ó|Õ|Ú|Ç',ll,'ÁÃÉÊÍÓÕÚÇ')
[1] "AAEEIOOUC"
gsubfn('Á|Ã|É|Ê|Í|Ó|Õ|Ú|Ç',ll,c('ÁÃÉÊÍÓÕÚÇ','ÍÓÕÚÇ'))
[1] "AAEEIOOUC" "IOOUC"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With