Remove accents from a dataframe column in R

Tags:

r

diacritics

I got a data.table base. I got a term column in this data.table

class(base$term) [1] character length(base$term) [1] 27486

I'm able to remove accents from a string. I'm able to remove accents from a vector of string.

iconv("Millésime",to="ASCII//TRANSLIT") [1] "Millesime" iconv(c("Millésime","boulangère"),to="ASCII//TRANSLIT") [1] "Millesime" "boulangere"

But for some reason, it does not work when I apply the very same function on my term column

base$terme[2] [1] "Millésime" iconv(base$terme[2],to="ASCII//TRANSLIT") [1] "MillACsime"

Does anybody know what is going on here?

213

asked Aug 25 '16 15:08

hans glick

1 Answers

It might be easier to use the stringi package. This way, you don't need to check the encoding beforehand. Furthermore stringi is consistent across operating systems and inconv is not.

library(stringi)  base <- data.table(terme = c("Millésime",                               "boulangère",                               "üéâäàåçêëèïîì"))  base[, terme := stri_trans_general(str = terme,                                     id = "Latin-ASCII")]  > base            terme 1:     Millesime 2:    boulangere 3: ueaaaaceeeiii

answered Sep 21 '22 19:09

Jeldrik

Related questions
                            
                                How to check if a vector contains n consecutive numbers
                            
                                ggplot2, legend on top and margin
                            
                                How to jitter/remove overlap for geom_text labels
                            
                                How to avoid using round() in every \Sexpr{}?
                            
                                Gradient legend in base
                            
                                How to check file size before opening?
                            
                                Changing date format to "%d/%m/%Y"
                            
                                Creating a data frame from two vectors using cbind
                            
                                How to select some rows with specific rownames from a dataframe? [closed]
                            
                                ggplot2: Divide Legend into Two Columns, Each with Its Own Title
                            
                                How to perform Lemmatization in R?
                            
                                Import data into R with an unknown number of columns?
                            
                                Standard error bars using stat_summary
                            
                                Positioning axes labels
                            
                                how to get index of sorted array elements
                            
                                how to drop columns by passing variable name with dplyr?
                            
                                ROC curve from training data in caret
                            
                                How to assign output of cat to an object?
                            
                                How to use a variable in dplyr::filter?
                            
                                How to import a .tsv file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With