Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all special characters from a string in R?

People also ask

How do I remove special characters from a string in R?

In this article, we are going to remove all special characters from strings in R Programming language. For this, we will use the str_replace_all() method to remove non-alphanumeric and punctuations which is available in stringr package.

How do I remove special characters from data in R?

To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.

How do I remove symbols from text in R?

Answer : Use [^[:alnum:]] to remove ~! @#$%^&*(){}_+:"<>?,./;'[]-= and use [^a-zA-Z0-9] to remove also â í ü Â á ą ę ś ć in regex or regexpr functions.


You need to use regular expressions to identify the unwanted characters. For the most easily readable code, you want the str_replace_all from the stringr package, though gsub from base R works just as well.

The exact regular expression depends upon what you are trying to do. You could just remove those specific characters that you gave in the question, but it's much easier to remove all punctuation characters.

x <- "a1~!@#$%^&*(){}_+:\"<>?,./;'[]-=" #or whatever
str_replace_all(x, "[[:punct:]]", " ")

(The base R equivalent is gsub("[[:punct:]]", " ", x).)

An alternative is to swap out all non-alphanumeric characters.

str_replace_all(x, "[^[:alnum:]]", " ")

Note that the definition of what constitutes a letter or a number or a punctuatution mark varies slightly depending upon your locale, so you may need to experiment a little to get exactly what you want.


Instead of using regex to remove those "crazy" characters, just convert them to ASCII, which will remove accents, but will keep the letters.

astr <- "Ábcdêãçoàúü"
iconv(astr, from = 'UTF-8', to = 'ASCII//TRANSLIT')

which results in

[1] "Abcdeacoauu"

Convert the Special characters to apostrophe,

Data  <- gsub("[^0-9A-Za-z///' ]","'" , Data ,ignore.case = TRUE)

Below code it to remove extra ''' apostrophe

Data <- gsub("''","" , Data ,ignore.case = TRUE)

Use gsub(..) function for replacing the special character with apostrophe