Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regex find all words in text

Tags:

python

regex

This sounds very simple, I know, but for some reason I can't get all the results I need

Word in this case is any char but white-space that is separetaed with white-space for example in the following string: "Hello there stackoverflow." the result should be: ['Hello','there','stackoverflow.']

My code:

import re

word_pattern = "^\S*\s|\s\S*\s|\s\S*$"
result = re.findall(word_pattern,text)
print result

but after using this pattern on a string like I've shown it only puts the first and the last words in the list and not the words separeted with two spaces

What is the problem with this pattern?


1 Answers

Use the \b boundary test instead:

r'\b\S+\b'

Result:

>>> import re
>>> re.findall(r'\b\S+\b', 'Hello there StackOverflow.')
['Hello', 'there', 'StackOverflow']

or not use a regular expression at all and just use .split(); the latter would include the punctiation in a sentence (the regex above did not match the . in the sentence).

like image 104
Martijn Pieters Avatar answered Apr 24 '26 18:04

Martijn Pieters



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!