I'm working with a Python regex for extracting time durations in '2h30m' format.
I run into an issue where non-capturing groups ((?:...)
) are getting captured inside named groups.
e.g. matching 2h30m
against:
(?P<hours>\d+(?:h))?(?P<minutes>\d+(?:m))?
would match {'hours': '2h', 'minutes': '30m'}
, and not 2
and 30
.
The workaround would be to use a positive lookahead assertions ((?=...)
), but this doesn't update the state of the regex FSM so we have to repeat the h
, m
suffixes:
(?P<hours>\d+(?=h))?h?(?P<minutes>\d+(?=m))?m?
Is there a better way to do this?
Non-capturing groups don't "anti-capture" what they match and remove them from outer groups. They're just a way to group things together so you can apply quantifiers to them.
To get the effect you want, you can rearrange the groups to put the non-capturing groups outside the capturing groups:
(?:(?P<hours>\d+)h)?(?:(?P<minutes>\d+)m)?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With