Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concatenate gsub [duplicate]

I'm currently running the following code to clean my data from accent characters:

df <- gsub('Á|Ã', 'A', df)
df <- gsub('É|Ê', 'E', df)
df <- gsub('Í',   'I', df)
df <- gsub('Ó|Õ', 'O', df)
df <- gsub('Ú',   'U', df)
df <- gsub('Ç',   'C', df)

However, I would like to do it in just one line (using another function for it would be ok). How can I do this?

like image 242
Waldir Leoncio Avatar asked Mar 30 '26 23:03

Waldir Leoncio


2 Answers

Try something like this

iconv(c('Á'), "utf8", "ASCII//TRANSLIT")

You can just add more elements to the c().

EDIT: it is machine dependent, check help(iconv)

Here is the R solution

mychar <- c('ÁÃÉÊÍÓÕÚÇ')
iconv(mychar, "latin1", "ASCII//TRANSLIT") # one line, as requested
[1] "AAEEIOOUC"
like image 189
Usobi Avatar answered Apr 02 '26 12:04

Usobi


It an encoding problem, Normally you resolve it by indicating the right encoding. If you still want to use regular expression to do it , you can use gsubfn to write one liner solution:

library(gsubfn)
ll <- list('Á'='A', 'Ã'='A', 'É'='E',
           'Ê'='E', 'Í'='I', 'Ó'='O',
           'Õ'='O', 'Ú'='U', 'Ç'='C')
gsubfn('Á|Ã|É|Ê|Í|Ó|Õ|Ú|Ç',ll,'ÁÃÉÊÍÓÕÚÇ')
[1] "AAEEIOOUC"
gsubfn('Á|Ã|É|Ê|Í|Ó|Õ|Ú|Ç',ll,c('ÁÃÉÊÍÓÕÚÇ','ÍÓÕÚÇ'))
[1] "AAEEIOOUC" "IOOUC"   
like image 45
agstudy Avatar answered Apr 02 '26 13:04

agstudy