Concatenate gsub [duplicate]

Question

I'm currently running the following code to clean my data from accent characters:

df <- gsub('Á|Ã', 'A', df)
df <- gsub('É|Ê', 'E', df)
df <- gsub('Í',   'I', df)
df <- gsub('Ó|Õ', 'O', df)
df <- gsub('Ú',   'U', df)
df <- gsub('Ç',   'C', df)

However, I would like to do it in just one line (using another function for it would be ok). How can I do this?

Usobi · Accepted Answer

Try something like this

iconv(c('Á'), "utf8", "ASCII//TRANSLIT")

You can just add more elements to the c().

EDIT: it is machine dependent, check help(iconv)

Here is the R solution

mychar <- c('ÁÃÉÊÍÓÕÚÇ')
iconv(mychar, "latin1", "ASCII//TRANSLIT") # one line, as requested
[1] "AAEEIOOUC"

agstudy · Answer

It an encoding problem, Normally you resolve it by indicating the right encoding. If you still want to use regular expression to do it , you can use gsubfn to write one liner solution:

library(gsubfn)
ll <- list('Á'='A', 'Ã'='A', 'É'='E',
           'Ê'='E', 'Í'='I', 'Ó'='O',
           'Õ'='O', 'Ú'='U', 'Ç'='C')
gsubfn('Á|Ã|É|Ê|Í|Ó|Õ|Ú|Ç',ll,'ÁÃÉÊÍÓÕÚÇ')
[1] "AAEEIOOUC"
gsubfn('Á|Ã|É|Ê|Í|Ó|Õ|Ú|Ç',ll,c('ÁÃÉÊÍÓÕÚÇ','ÍÓÕÚÇ'))
[1] "AAEEIOOUC" "IOOUC"

Concatenate gsub [duplicate]

Tags:

regex

optimization

r

gsub

Waldir Leoncio

2 Answers

Usobi

agstudy

Recent Activity

Donate For Us

Concatenate gsub [duplicate]

Tags:

regex

optimization

r

gsub

Waldir Leoncio

2 Answers

Usobi

agstudy

Related questions

Recent Activity

Donate For Us