I'd like to use R's gsub to remove all punctuation from a text except for apostrophes. I'm fairly new to regex but am learning.
Example:
x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?" gsub("[[:punct:]]", "", as.character(x))
Current Output (no apostrophe in don't)
[1] "I like to chew gum but dont like bubble gum"
Desired Output (I desire the apostrophe in don't to stay)
[1] "I like to chew gum but don't like bubble gum"
Using the [[:punct:]] regexp class will ensure you really do remove all punctuation. And it can be done entirely within R.
One of the easiest ways to remove punctuation from a string in Python is to use the str. translate() method. The translate method typically takes a translation table, which we'll do using the . maketrans() method.
x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?" gsub("[^[:alnum:][:space:]']", "", x) [1] "I like to chew gum but don't like bubble gum"
The above regex is much more straight forward. It replaces everything that's not alphanumeric signs, space or apostrophe (caret symbol!) with an empty string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With