Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optional matching in regex

Tags:

python

regex

Attempting to match these input strings into three matching groups (Regex101 link):

    | input string  | x  | y   | z  |
------------------------------------
  I | a             | a  |     |    |
 II | a - b         | a  | b   |    |
III | a - b-c       | a  | b-c |    |
 IV | a - b, 12     | a  | b   | 12 |
  V | a - 12        | a  |     | 12 |
 VI | 12            |    |     | 12 |

So the anatomy of the input strings is as follows:

  • optional first part with free text up until a hyphen with surrounding whitespace (-) or the input string ends
  • optional second part with any character after the first hyphen with surrounding whitespace up until a comma or the input string ends
  • optionally exactly two digits at the end

I've tried a plethora of different solutions, this is my current attempt:

^(?P<x>.*)(?:-)(?P<y>.*)(?<!\d)(?P<z>\d{0,2})(?!\d)$

It handles scenarios II, IV and V OK (must do some trimming of white space as well), however:

  • I and VI are not returned at all
  • III is not split at the first hyphen but at the last
like image 670
salient Avatar asked Dec 23 '22 18:12

salient


1 Answers

This seems to do reasonably well:

^(?:(.*?)(?: - |$))?(?:(.*?)(?:, |$))?(\d\d$)?$

The values of interest will be in groups 1, 2 and 3, respectively.

The only culprit is that "two digits" will be

  • in group 2 for case V and
  • in group 1 for case VI,

the other groups being empty in those cases.

This is because "two digits" happily matches the "free text until the delimiter, or the string ends" rule.

You could use negative look-aheads to force the two digits into the last group, but unless "two digits" aren't legal values for groups 1 and 2, this will not be correct. In any case it would make the expression unwieldy:

^(?:((?!\d\d$).*?)(?: - |$))?(?:((?!\d\d$).*?)(?:, |$))?(\d\d$)?$

Breakdown:

^                    # string starts
(?:(.*?)(?: - |$))?  # any text, reluctantly, and " - " or the string ends
(?:(.*?)(?:, |$))?   # any text, reluctantly, and ", " or the string ends
(\d\d$)?             # two digits and the string ends
$                    # string ends
like image 173
Tomalak Avatar answered Jan 06 '23 19:01

Tomalak