i'm trying to get the start and end positions of a query in sequence by using re.findall
import re
sequence = 'aaabbbaaacccdddeeefff'
query = 'aaa'
findall = re.findall(query,sequence)
>>> ['aaa','aaa']
how do i get something like findall.start() or findall.end() ?
i would like to get
start = [0,6]
end = [2,8]
i know that
search = re.search(query,sequence)
print search.start(),search.end()
>>> 0,2
would give me only the first instance
Use re.finditer
:
>>> import re
>>> sequence = 'aaabbbaaacccdddeeefff'
>>> query = 'aaa'
>>> r = re.compile(query)
>>> [[m.start(),m.end()] for m in r.finditer(sequence)]
[[0, 3], [6, 9]]
From the docs:
Return an
iterator
yieldingMatchObject
instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found.
You can't. findall
is a convenience function that, as the docs say, returns "a list of strings". If you want a list of MatchObject
s, you can't use findall
.
However, you can use finditer
. If you're just iterating over the matches for match in re.findall(…):
, you can use for match in re.finditer(…)
the same way—except you get MatchObject
values instead of strings. If you actually need a list, just use matches = list(re.finditer(…))
.
Use finditer instead of findall. This gives you back an iterator yielding MatchObject instances and you can get start/end from the MatchObject.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With