I need to use regex to strip punctuation at the start and end of a word. It seems like regex would be the best option for this. I don't want punctuation removed from words like 'you're', which is why I'm not using .replace().
One of the easiest ways to remove punctuation from a string in Python is to use the str. translate() method. The translate() method typically takes a translation table, which we'll do using the . maketrans() method.
We can use replace() method to remove punctuation from python string by replacing each punctuation mark by empty string. We will iterate over the entire punctuation marks one by one replace it by an empty string in our text string.
One of the easiest and fastest methods through which punctuation marks and special characters can be removed from a string is by using the translate () method. The built-in translate () function is available in the string library of Python.
To remove punctuation with Python Pandas, we can use the DataFrame's str. replace method. We call replace with a regex string that matches all punctuation characters and replace them with empty strings. replace returns a new DataFrame column and we assign that to df['text'] .
You don't need regular expression to do this task. Use str.strip
with string.punctuation
:
>>> import string >>> string.punctuation '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~' >>> '!Hello.'.strip(string.punctuation) 'Hello' >>> ' '.join(word.strip(string.punctuation) for word in "Hello, world. I'm a boy, you're a girl.".split()) "Hello world I'm a boy you're a girl"
I think this function will be helpful and concise in removing punctuation:
import re def remove_punct(text): new_words = [] for word in text: w = re.sub(r'[^\w\s]','',word) #remove everything except words and space w = re.sub(r'_','',w) #how to remove underscore as well new_words.append(w) return new_words
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With