I have a problem with the code and can not figure out how to move forward.
tweet = "I am tired! I like fruit...and milk" clean_words = tweet.translate(None, ",.;@#?!&$") words = clean_words.split() print tweet print words
Output:
['I', 'am', 'tired', 'I', 'like', 'fruitand', 'milk']
What I would like is to replace the punctuation with white space but do not know what function or cycle use. Can anyone help me please?
Use regex to Strip Punctuation From a String in Python The regex pattern [^\w\s] captures everything which is not a word or whitespace(i.e. the punctuations) and replaces it with an empty string.
We can use replace() method to remove punctuation from python string by replacing each punctuation mark by empty string. We will iterate over the entire punctuation marks one by one replace it by an empty string in our text string.
To remove punctuation with Python Pandas, we can use the DataFrame's str. replace method. We call replace with a regex string that matches all punctuation characters and replace them with empty strings. replace returns a new DataFrame column and we assign that to df['text'] .
punctuation is a pre-initialized string used as string constant. In Python, string. punctuation will give the all sets of punctuation. Syntax : string.punctuation. Parameters : Doesn't take any parameter, since it's not a function.
It is easy to achieve by changing your "maketrans" like this:
import string tweet = "I am tired! I like fruit...and milk" translator = string.maketrans(string.punctuation, ' '*len(string.punctuation)) #map punctuation to space print(tweet.translate(translator))
It works on my machine running python 3.5.2 and 2.x. Hope that it works on yours too.
Here is a regex based solution that has been tested under Python 3.5.1. I think it is both simple and succinct.
import re tweet = "I am tired! I like fruit...and milk" clean = re.sub(r""" [,.;@#?!&$]+ # Accept one or more copies of punctuation \ * # plus zero or more copies of a space, """, " ", # and replace it with a single space tweet, flags=re.VERBOSE) print(tweet + "\n" + clean)
Results:
I am tired! I like fruit...and milk I am tired I like fruit and milk
Compact version:
tweet = "I am tired! I like fruit...and milk" clean = re.sub(r"[,.;@#?!&$]+\ *", " ", tweet) print(tweet + "\n" + clean)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With