python regex find all words in text

Question

This sounds very simple, I know, but for some reason I can't get all the results I need

Word in this case is any char but white-space that is separetaed with white-space for example in the following string: "Hello there stackoverflow." the result should be: ['Hello','there','stackoverflow.']

My code:

import re

word_pattern = "^\S*\s|\s\S*\s|\s\S*$"
result = re.findall(word_pattern,text)
print result

but after using this pattern on a string like I've shown it only puts the first and the last words in the list and not the words separeted with two spaces

What is the problem with this pattern?

Martijn Pieters · Accepted Answer

Use the \b boundary test instead:

r'\b\S+\b'

Result:

>>> import re
>>> re.findall(r'\b\S+\b', 'Hello there StackOverflow.')
['Hello', 'there', 'StackOverflow']

or not use a regular expression at all and just use .split(); the latter would include the punctiation in a sentence (the regex above did not match the . in the sentence).

python regex find all words in text

Tags:

python

regex

1 Answers

Martijn Pieters

Recent Activity

Donate For Us

python regex find all words in text

Tags:

python

regex

1 Answers

Martijn Pieters

Related questions

Recent Activity

Donate For Us