Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strip punctuation with regex - python

Tags:

I need to use regex to strip punctuation at the start and end of a word. It seems like regex would be the best option for this. I don't want punctuation removed from words like 'you're', which is why I'm not using .replace().

like image 670
user2696287 Avatar asked Aug 25 '13 12:08

user2696287


People also ask

How do you remove punctuation from regular expressions in Python?

One of the easiest ways to remove punctuation from a string in Python is to use the str. translate() method. The translate() method typically takes a translation table, which we'll do using the . maketrans() method.

How do you remove punctuation from a string using string punctuation in Python?

We can use replace() method to remove punctuation from python string by replacing each punctuation mark by empty string. We will iterate over the entire punctuation marks one by one replace it by an empty string in our text string.

How do you remove special and punctuation characters in Python?

One of the easiest and fastest methods through which punctuation marks and special characters can be removed from a string is by using the translate () method. The built-in translate () function is available in the string library of Python.

How do you remove punctuation from a column in Python?

To remove punctuation with Python Pandas, we can use the DataFrame's str. replace method. We call replace with a regex string that matches all punctuation characters and replace them with empty strings. replace returns a new DataFrame column and we assign that to df['text'] .


2 Answers

You don't need regular expression to do this task. Use str.strip with string.punctuation:

>>> import string >>> string.punctuation '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~' >>> '!Hello.'.strip(string.punctuation) 'Hello'  >>> ' '.join(word.strip(string.punctuation) for word in "Hello, world. I'm a boy, you're a girl.".split()) "Hello world I'm a boy you're a girl" 
like image 110
falsetru Avatar answered Oct 05 '22 02:10

falsetru


I think this function will be helpful and concise in removing punctuation:

import re def remove_punct(text):     new_words = []     for word in text:         w = re.sub(r'[^\w\s]','',word) #remove everything except words and space         w = re.sub(r'_','',w) #how to remove underscore as well         new_words.append(w)     return new_words 
like image 33
Shalini Baranwal Avatar answered Oct 05 '22 03:10

Shalini Baranwal