I have this code for removing all punctuation from a regex string:
import regex as re re.sub(ur"\p{P}+", "", txt)
How would I change it to allow hyphens? If you could explain how you did it, that would be great. I understand that here, correct me if I'm wrong, P with anything after it is punctuation.
Use regex to Strip Punctuation From a String in Python The regex pattern [^\w\s] captures everything which is not a word or whitespace(i.e. the punctuations) and replaces it with an empty string.
One of the easiest ways to remove punctuation from a string in Python is to use the str. translate() method. The translate method typically takes a translation table, which we'll do using the . maketrans() method.
You can use this: Regex. Replace("This is a test string, with lots of: punctuations; in it?!.", @"[^\w\s]", "");
To remove punctuation with Python Pandas, we can use the DataFrame's str. replace method. We call replace with a regex string that matches all punctuation characters and replace them with empty strings. replace returns a new DataFrame column and we assign that to df['text'] .
[^\P{P}-]+
\P
is the complementary of \p
- not punctuation. So this matches anything that is not (not punctuation or a dash) - resulting in all punctuation except dashes.
Example: http://www.rubular.com/r/JsdNM3nFJ3
If you want a non-convoluted way, an alternative is \p{P}(?<!-)
: match all punctuation, and then check it wasn't a dash (using negative lookbehind).
Working example: http://www.rubular.com/r/5G62iSYTdk
Here's how to do it with the re
module, in case you have to stick with the standard libraries:
# works in python 2 and 3 import re import string remove = string.punctuation remove = remove.replace("-", "") # don't remove hyphens pattern = r"[{}]".format(remove) # create the pattern txt = ")*^%{}[]thi's - is - @@#!a !%%!!%- test." re.sub(pattern, "", txt) # >>> 'this - is - a - test'
If performance matters, you may want to use str.translate
, since it's faster than using a regex. In Python 3, the code is txt.translate({ord(char): None for char in remove})
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With