Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non-capturing group inside a named group

Tags:

python

regex

I'm working with a Python regex for extracting time durations in '2h30m' format. I run into an issue where non-capturing groups ((?:...)) are getting captured inside named groups.

e.g. matching 2h30m against:

(?P<hours>\d+(?:h))?(?P<minutes>\d+(?:m))?

would match {'hours': '2h', 'minutes': '30m'}, and not 2 and 30.

The workaround would be to use a positive lookahead assertions ((?=...)), but this doesn't update the state of the regex FSM so we have to repeat the h, m suffixes:

(?P<hours>\d+(?=h))?h?(?P<minutes>\d+(?=m))?m?

Is there a better way to do this?

like image 560
megapctr Avatar asked Nov 07 '15 15:11

megapctr


1 Answers

Non-capturing groups don't "anti-capture" what they match and remove them from outer groups. They're just a way to group things together so you can apply quantifiers to them.

To get the effect you want, you can rearrange the groups to put the non-capturing groups outside the capturing groups:

(?:(?P<hours>\d+)h)?(?:(?P<minutes>\d+)m)?
like image 106
user2357112 supports Monica Avatar answered Sep 23 '22 20:09

user2357112 supports Monica