I'm parsing strings that could have any number of quoted strings inside them (I'm parsing code, and trying to avoid PLY). I want to find out if a substring is quoted, and I have the substrings index. My initial thought was to use re to find all the matches and then figure out the range of indexes they represent. It seems like I should use re with a regex like <code>\"[^\"]+\"|'[^']+'</code> (I'm avoiding dealing with triple quoted and such strings at the moment). When I use findall() I get a list of the matching strings, which is somewhat nice, but I need indexes. My substring might be as simple as <code>c</code>, and I need to figure out if this particular <code>c</code> is actually quoted or not.

This should solve your issue pattern=r"(?=(\"[^\"]+\"|'[^']+'))" Then use the following to get all overlapping indices, indicesTuple=[(mObj.start(1),mObj.end(1)-1) for mObj in re.finditer(pattern,input)]

Find the indexes of all regex matches?

Tags:

python

regex

indexing

I'm parsing strings that could have any number of quoted strings inside them (I'm parsing code, and trying to avoid PLY). I want to find out if a substring is quoted, and I have the substrings index. My initial thought was to use re to find all the matches and then figure out the range of indexes they represent.

It seems like I should use re with a regex like \"[^\"]+\"|'[^']+' (I'm avoiding dealing with triple quoted and such strings at the moment). When I use findall() I get a list of the matching strings, which is somewhat nice, but I need indexes.

My substring might be as simple as c, and I need to figure out if this particular c is actually quoted or not.

478

asked Aug 19 '10 07:08

xitrium

3 Answers

This is what you want: (source)

re.finditer(pattern, string[, flags])  
Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

You can then get the start and end positions from the MatchObjects.

e.g.

[(m.start(0), m.end(0)) for m in re.finditer(pattern, string)]

answered Oct 02 '22 08:10

Dave Kirby

#To get indice of all occurence

S = input() # Source String 
k = input() # String to be searched
import re
pattern = re.compile(k)
r = pattern.search(S)
if not r: print("(-1, -1)")
while r:
    print("({0}, {1})".format(r.start(), r.end() - 1))
    r = pattern.search(S,r.start() + 1)

answered Oct 02 '22 07:10

Be Champzz

This should solve your issue pattern=r"(?=(\"[^\"]+\"|'[^']+'))"

Then use the following to get all overlapping indices,

indicesTuple=[(mObj.start(1),mObj.end(1)-1) for mObj in re.finditer(pattern,input)]

answered Oct 02 '22 08:10

Omkar Rahane

Related questions
                            
                                scikit-learn .predict() default threshold
                            
                                What is the official "preferred" way to install pip and virtualenv systemwide?
                            
                                shuffle vs permute numpy
                            
                                SQLAlchemy create_all() does not create tables
                            
                                Python, opposite function urllib.urlencode
                            
                                Tensorflow - ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float)
                            
                                gnuplot vs Matplotlib [closed]
                            
                                How to have same text in two links with restructured text?
                            
                                'invalid value encountered in double_scalars' warning, possibly numpy
                            
                                Python: Mocking a context manager
                            
                                how to test if a variable is pd.NaT?
                            
                                Python: Maximum recursion depth exceeded
                            
                                python filter list of dictionaries based on key value
                            
                                What is the max length of a Python string?
                            
                                Sending SOAP request using Python Requests
                            
                                What is the difference between multiprocessing and subprocess?
                            
                                Is there an object unique identifier in Python
                            
                                Merging dataframes on index with pandas
                            
                                Extract list of attributes from list of objects in python
                            
                                Find all index position in list based on partial string inside item in list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With