Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex matching char only if a specific char appeared before (conditional regex)

developing a mobile (israeli) phone number regex. currently I have

re.compile(r'^[\(]?0?(5[023456789])\)?(\-)?\d{7}$')

which catches most use cases. the problem is matching the second parenthesis only if a first one parenthesis appears.

so (055)-5555555 or (055)5555555 or 0555555555 would match but: 055)-5555555 wouldn't. I know I can use 2 regex to test for the condition (if the first one matches test for the other condition) but that doesn't seem like a smart solution.

I guess I need something like a regex lookaround but not sure how to use it, or that I understand the concept correctly

Edit: explaining the logic

the area code: should start with 5 and then a single digit number (from a specific list), with an option zero before. also possible that would be inside a parenthesis. then an optional hyphen and 7 digits

Clarfication: I need to match both parenthesis only if the other one exists, that true also for the first one not only for the second one, missed that point

like image 685
alonisser Avatar asked Dec 16 '22 13:12

alonisser


2 Answers

First you must capture the opening parenthesis and then use a conditional pattern (I know the link is to php.net but I find it useful when referencing regexes, it also includes an example which exactly matches your case) that will only be applied if the first opening parenthesis is matched.

The pattern..

^(\()?0?(5[02-9])(?(1)\))-?(\d{7})$

Will match:

(055)-5555555
(055)5555555 
0555555555

but not:

055)-5555555

Captured groups

  1. The opening parenthesis (empty if not found)
  2. Area code (eg. 55)
  3. The phone number (eg. 5555555)

How it works

The part (\()? matches the opening parenthesis. It's optional.

The part (?(1)\)) checks if the first captured group (in our case the opening parenthesis) has a match, if YES then the string must also match the closing parenthesis.

If no opening parenthesis has been found, the condition is effectively ignored.

like image 140
kjetilh Avatar answered Dec 21 '22 11:12

kjetilh


Use the (?(id/name)yes-pattern|no-pattern) syntax to match the closing parenthesis only if the opening parethesis matched:

re.compile(r'^(\()?0?(5[023456789])(?(1)\))-?\d{7}$')

The (?(1)\)) part matches \) if there is a group 1 (the | no pattern is optional).

Demo:

>>> phone.search('(055)-5555555')
<_sre.SRE_Match object at 0x101e18a48>
>>> phone.search('055)-5555555') is None
True
like image 39
Martijn Pieters Avatar answered Dec 21 '22 11:12

Martijn Pieters