I want to get words in a text string in python
s = "The saddest aspect of life right now is: science gathers knowledge faster than society gathers wisdom."
result = re.sub("\b[^\w\d_]+\b", " ", s ).split()
print result
I am getting:
['The', 'saddest', 'aspect', 'of', 'life', 'right', 'now', 'is:', 'science', 'gathers', 'knowledge', 'faster', 'than', 'society', 'gathers', 'wisdom.']
How can I get "is" and not "is:" on strings that happen to contain : ?
I thought using \b would be enough...
I think you intended to pass a raw string to re.sub (notice the r).
result = re.sub(r"\b[^\w\d_]+\b", " ", s ).split()
Returns:
['The', 'saddest', 'aspect', 'of', 'life', 'right', 'now', 'is', 'science', 'gathers', 'knowledge', 'faster', 'than', 'society', 'gathers', 'wisdom.']
You forgot to make it a raw string literal (r"..")
>>> import re
>>> s = "The saddest aspect of life right now is: science gathers knowledge faster than society gathers wisdom."
>>> re.sub("\b[^\w\d_]+\b", " ", s ).split()
['The', 'saddest', 'aspect', 'of', 'life', 'right', 'now', 'is:', 'science', 'gathers', 'knowledge', 'faster', 'than', 'society', 'gathers', 'wisdom.']
>>> re.sub(r"\b[^\w\d_]+\b", " ", s ).split()
['The', 'saddest', 'aspect', 'of', 'life', 'right', 'now', 'is', 'science', 'gathers', 'knowledge', 'faster', 'than', 'society', 'gathers', 'wisdom.']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With