Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace the punctuation with whitespace

I have a problem with the code and can not figure out how to move forward.

tweet = "I am tired! I like fruit...and milk" clean_words = tweet.translate(None, ",.;@#?!&$") words = clean_words.split()  print tweet print words 

Output:

['I', 'am', 'tired', 'I', 'like', 'fruitand', 'milk'] 

What I would like is to replace the punctuation with white space but do not know what function or cycle use. Can anyone help me please?

like image 892
oceano22 Avatar asked Jan 18 '16 17:01

oceano22


People also ask

How do you replace all punctuation with space in Python?

Use regex to Strip Punctuation From a String in Python The regex pattern [^\w\s] captures everything which is not a word or whitespace(i.e. the punctuations) and replaces it with an empty string.

How do you remove punctuation from a string?

We can use replace() method to remove punctuation from python string by replacing each punctuation mark by empty string. We will iterate over the entire punctuation marks one by one replace it by an empty string in our text string.

How do I remove punctuation from a panda string?

To remove punctuation with Python Pandas, we can use the DataFrame's str. replace method. We call replace with a regex string that matches all punctuation characters and replace them with empty strings. replace returns a new DataFrame column and we assign that to df['text'] .

What is punctuation in Python?

punctuation is a pre-initialized string used as string constant. In Python, string. punctuation will give the all sets of punctuation. Syntax : string.punctuation. Parameters : Doesn't take any parameter, since it's not a function.


2 Answers

It is easy to achieve by changing your "maketrans" like this:

import string tweet = "I am tired! I like fruit...and milk" translator = string.maketrans(string.punctuation, ' '*len(string.punctuation)) #map punctuation to space print(tweet.translate(translator)) 

It works on my machine running python 3.5.2 and 2.x. Hope that it works on yours too.

like image 156
YuanzhiKe Avatar answered Sep 20 '22 17:09

YuanzhiKe


Here is a regex based solution that has been tested under Python 3.5.1. I think it is both simple and succinct.

import re  tweet = "I am tired! I like fruit...and milk" clean = re.sub(r"""                [,.;@#?!&$]+  # Accept one or more copies of punctuation                \ *           # plus zero or more copies of a space,                """,                " ",          # and replace it with a single space                tweet, flags=re.VERBOSE) print(tweet + "\n" + clean) 

Results:

I am tired! I like fruit...and milk I am tired I like fruit and milk 

Compact version:

tweet = "I am tired! I like fruit...and milk" clean = re.sub(r"[,.;@#?!&$]+\ *", " ", tweet) print(tweet + "\n" + clean) 
like image 36
Jonathan Avatar answered Sep 17 '22 17:09

Jonathan