Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R remove non-alphanumeric symbols from a string

Tags:

regex

r

I have a string and I want to remove all non-alphanumeric symbols from and then put into a vector. So this:

"This is a string.  In addition, this is a string!" 

would become:

>stringVector1

"This","is","a","string","In","addition","this","is","a","string"

I've looked at grep() but can't find an example that matches. Any suggestions?

like image 714
screechOwl Avatar asked Jan 22 '12 05:01

screechOwl


People also ask

How do you remove non-alphanumeric characters from a string?

A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .

How do I remove symbols from text in R?

Answer : Use [^[:alnum:]] to remove ~! @#$%^&*(){}_+:"<>?,./;'[]-= and use [^a-zA-Z0-9] to remove also â í ü Â á ą ę ś ć in regex or regexpr functions.

How do I remove non characters from a string?

To remove all non-alphanumeric characters from a string, call the replace() method, passing it a regular expression that matches all non-alphanumeric characters as the first parameter and an empty string as the second. The replace method returns a new string with all matches replaced.

How do you get rid of non-alphanumeric?

Non-alphanumeric characters can be remove by using preg_replace() function. This function perform regular expression search and replace. The function preg_replace() searches for string specified by pattern and replaces pattern with replacement if found.


2 Answers

here is an example:

> str <- "This is a string. In addition, this is a string!" > str [1] "This is a string. In addition, this is a string!" > strsplit(gsub("[^[:alnum:] ]", "", str), " +")[[1]]  [1] "This"     "is"       "a"        "string"   "In"       "addition" "this"     "is"       "a"        [10] "string"   
like image 149
kohske Avatar answered Oct 02 '22 19:10

kohske


Another approach to handle this question

library(stringr) text =  c("This is a string.  In addition, this is a string!") str_split(str_squish((str_replace_all(text, regex("\\W+"), " "))), " ") #[1] "This"     "is"       "a"        "string"   "In"       "addition" "this"     "is"       "a"        "string"   
  • str_replace_all(text, regex("\\W+"), " "): find non-word character and replace " "
  • str_squish(): reduces repeated whitespace inside a string
  • str_split(): split up a string into pieces
like image 45
Tho Vu Avatar answered Oct 02 '22 20:10

Tho Vu