I'm trying to locate all index positions of a string in a list of words and I want the values returned as a list. I would like to find the string if it is on its own, or if it is preceded or followed by punctuation, but not if it is a substring of a larger word.
The following code only captures "cow" only and misses both "test;cow" and "cow."
myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if x == myString]
print indices
>> 5
I have tried changing the code to use a regular expression:
import re
myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if x == re.match('\W*myString\W*', myList)]
print indices
But this gives an error: expected string or buffer
If anyone knows what I'm doing wrong I'd be very happy to hear. I have a feeling it's something to do with the fact I'm trying to use a regular expression in there when it's expecting a string. Is there a solution?
The output I'm looking for should read:
>> [0, 4, 5]
Thanks
You don't need to assign the result of match
back to x
. And your match should be on x
rather than list
.
Also, you need to use re.search
instead of re.match
, since your the regex pattern '\W*myString\W*'
will not match the first element. That's because test;
is not matched by \W*
. Actually, you only need to test for immediate following and preceding character, and not the complete string.
So, you can rather use word boundaries
around the string:
pattern = r'\b' + re.escape(myString) + r'\b'
indices = [i for i, x in enumerate(myList) if re.search(pattern, x)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With