Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex findall start() and end() ? Python

i'm trying to get the start and end positions of a query in sequence by using re.findall

import re

sequence = 'aaabbbaaacccdddeeefff'

query = 'aaa'

findall = re.findall(query,sequence)

>>> ['aaa','aaa']

how do i get something like findall.start() or findall.end() ?

i would like to get

start = [0,6]
end = [2,8]

i know that

search = re.search(query,sequence)

print search.start(),search.end()

>>> 0,2

would give me only the first instance

like image 669
O.rka Avatar asked Jul 11 '13 22:07

O.rka


3 Answers

Use re.finditer:

>>> import re
>>> sequence = 'aaabbbaaacccdddeeefff'
>>> query = 'aaa'
>>> r = re.compile(query)
>>> [[m.start(),m.end()] for m in r.finditer(sequence)]
[[0, 3], [6, 9]]

From the docs:

Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found.

like image 99
Ashwini Chaudhary Avatar answered Oct 17 '22 11:10

Ashwini Chaudhary


You can't. findall is a convenience function that, as the docs say, returns "a list of strings". If you want a list of MatchObjects, you can't use findall.

However, you can use finditer. If you're just iterating over the matches for match in re.findall(…):, you can use for match in re.finditer(…) the same way—except you get MatchObject values instead of strings. If you actually need a list, just use matches = list(re.finditer(…)).

like image 29
abarnert Avatar answered Oct 17 '22 11:10

abarnert


Use finditer instead of findall. This gives you back an iterator yielding MatchObject instances and you can get start/end from the MatchObject.

like image 1
tuckermi Avatar answered Oct 17 '22 12:10

tuckermi