Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the indexes of all regex matches?

I'm parsing strings that could have any number of quoted strings inside them (I'm parsing code, and trying to avoid PLY). I want to find out if a substring is quoted, and I have the substrings index. My initial thought was to use re to find all the matches and then figure out the range of indexes they represent.

It seems like I should use re with a regex like \"[^\"]+\"|'[^']+' (I'm avoiding dealing with triple quoted and such strings at the moment). When I use findall() I get a list of the matching strings, which is somewhat nice, but I need indexes.

My substring might be as simple as c, and I need to figure out if this particular c is actually quoted or not.

like image 478
xitrium Avatar asked Aug 19 '10 07:08

xitrium


People also ask

How do I find all matches in regex?

The method str. match(regexp) finds matches for regexp in the string str . If the regexp has flag g , then it returns an array of all matches as strings, without capturing groups and other details. If there are no matches, no matter if there's flag g or not, null is returned.

What is the use of SPAN () in regular expression?

span() method returns a tuple containing starting and ending index of the matched string. If group did not contribute to the match it returns(-1,-1). Parameters: group (optional) By default this is 0. Return: A tuple containing starting and ending index of the matched string.

How do you match index in JavaScript?

We can use the JavaScript regex's exec method to find the index of a regex match. For instance, we can write: const match = /bar/.


3 Answers

This is what you want: (source)

re.finditer(pattern, string[, flags])  

Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

You can then get the start and end positions from the MatchObjects.

e.g.

[(m.start(0), m.end(0)) for m in re.finditer(pattern, string)] 
like image 72
Dave Kirby Avatar answered Oct 02 '22 08:10

Dave Kirby


#To get indice of all occurence

S = input() # Source String 
k = input() # String to be searched
import re
pattern = re.compile(k)
r = pattern.search(S)
if not r: print("(-1, -1)")
while r:
    print("({0}, {1})".format(r.start(), r.end() - 1))
    r = pattern.search(S,r.start() + 1)
like image 31
Be Champzz Avatar answered Oct 02 '22 07:10

Be Champzz


This should solve your issue pattern=r"(?=(\"[^\"]+\"|'[^']+'))"

Then use the following to get all overlapping indices,

indicesTuple=[(mObj.start(1),mObj.end(1)-1) for mObj in re.finditer(pattern,input)]

like image 26
Omkar Rahane Avatar answered Oct 02 '22 08:10

Omkar Rahane