Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all punctuation except apostrophes in R

I'd like to use R's gsub to remove all punctuation from a text except for apostrophes. I'm fairly new to regex but am learning.

Example:

x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?" gsub("[[:punct:]]", "", as.character(x)) 

Current Output (no apostrophe in don't)

[1] "I like to chew gum but dont like bubble gum" 

Desired Output (I desire the apostrophe in don't to stay)

[1] "I like to chew gum but don't like bubble gum" 
like image 572
Tyler Rinker Avatar asked Jan 02 '12 03:01

Tyler Rinker


People also ask

How do I remove punctuation from a Dataframe in R?

Using the [[:punct:]] regexp class will ensure you really do remove all punctuation. And it can be done entirely within R.

How do you remove punctuation from regular expressions in Python?

One of the easiest ways to remove punctuation from a string in Python is to use the str. translate() method. The translate method typically takes a translation table, which we'll do using the . maketrans() method.


1 Answers

x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?" gsub("[^[:alnum:][:space:]']", "", x)  [1] "I like to chew gum but don't like bubble gum" 

The above regex is much more straight forward. It replaces everything that's not alphanumeric signs, space or apostrophe (caret symbol!) with an empty string.

like image 186
Kay Avatar answered Sep 17 '22 21:09

Kay