I am putting together a fairly complex regular expression. One part of the expression matches strings such as '+a', '-57' etc. A + or a - followed by any number of letters or numbers. I want to match 0 or more strings matching this pattern.
This is the expression I came up with:
([\+-][a-zA-Z0-9]+)*
If I were to search the string '-56+a' using this pattern I would expect to get two matches:
+a and -56
However, I only get the last match returned:
>>> m = re.match("([\+-][a-zA-Z0-9]+)*", '-56+a') >>> m.groups() ('+a',)
Looking at the python docs I see that:
If a group matches multiple times, only the last match is accessible:
>>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times. >>> m.group(1) # Returns only the last match. 'c3'
So, my question is: how do you access multiple group matches?
Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".
A repeat is an expression that is repeated an arbitrary number of times. An expression followed by '*' can be repeated any number of times, including zero. An expression followed by '+' can be repeated any number of times, but at least once.
Non-capturing groups are important constructs within Java Regular Expressions. They create a sub-pattern that functions as a single unit but does not save the matched character sequence. In this tutorial, we'll explore how to use non-capturing groups in Java Regular Expressions.
Drop the *
from your regex (so it matches exactly one instance of your pattern). Then use either re.findall(...)
or re.finditer
(see here) to return all matches.
Update:
It sounds like you're essentially building a recursive descent parser. For relatively simple parsing tasks, it is quite common and entirely reasonable to do that by hand. If you're interested in a library solution (in case your parsing task may become more complicated later on, for example), have a look at pyparsing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With