This sounds very simple, I know, but for some reason I can't get all the results I need
Word in this case is any char but white-space that is separetaed with white-space for example in the following string: "Hello there stackoverflow." the result should be: ['Hello','there','stackoverflow.']
My code:
import re
word_pattern = "^\S*\s|\s\S*\s|\s\S*$"
result = re.findall(word_pattern,text)
print result
but after using this pattern on a string like I've shown it only puts the first and the last words in the list and not the words separeted with two spaces
What is the problem with this pattern?
Use the \b boundary test instead:
r'\b\S+\b'
Result:
>>> import re
>>> re.findall(r'\b\S+\b', 'Hello there StackOverflow.')
['Hello', 'there', 'StackOverflow']
or not use a regular expression at all and just use .split(); the latter would include the punctiation in a sentence (the regex above did not match the . in the sentence).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With