Just like the question title.
I'm new to Python and regular expressions. Hereby I have to search for a specific word from a paragraph and show all indices of occurrence.
For example:
the paragraph is:
This is a testing text and used to test and test and test.
and the word:
test
The algorithm should return the index of non-overlapping occurences of 3 words test in the above paragraph (but not testing, because I mean search the whole word, not just substring).
Another example with the same paragraph and this "word":
test and
The algorithm should return 2 occurrences of test and.
I guess I must use some regular expressions to find the pattern of that whole word, with preceding and following are punctuations such as . , ; ? -
After Googling I found something like re.finditer
should be used but it seems that I haven't found out the right way to go. Please help, thank you in advance. ;)
Yes, finditer
is the way to go. Use start()
to find the index of the match.
Example:
import re
a="This is a testing text and used to test and test and test."
print [m.start() for m in re.finditer(r"\btest\b", a)]
print [m.start() for m in re.finditer(r"\btest and\b", a)]
Output:
[35, 44, 53]
[35, 44]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With