Attempting to match these input strings into three matching groups (Regex101 link):
| input string | x | y | z |
------------------------------------
I | a | a | | |
II | a - b | a | b | |
III | a - b-c | a | b-c | |
IV | a - b, 12 | a | b | 12 |
V | a - 12 | a | | 12 |
VI | 12 | | | 12 |
So the anatomy of the input strings is as follows:
- optional first part with free text up until a
hyphen
with surrounding whitespace (-
) or the input string ends- optional second part with any character after the first hyphen with surrounding whitespace up until a
comma
or the input string ends- optionally exactly two digits at the end
I've tried a plethora of different solutions, this is my current attempt:
^(?P<x>.*)(?:-)(?P<y>.*)(?<!\d)(?P<z>\d{0,2})(?!\d)$
It handles scenarios II
, IV
and V
OK (must do some trimming of white space as well), however:
I
and VI
are not returned at allIII
is not split at the first hyphen but at the lastThis seems to do reasonably well:
^(?:(.*?)(?: - |$))?(?:(.*?)(?:, |$))?(\d\d$)?$
The values of interest will be in groups 1, 2 and 3, respectively.
The only culprit is that "two digits" will be
the other groups being empty in those cases.
This is because "two digits" happily matches the "free text until the delimiter, or the string ends" rule.
You could use negative look-aheads to force the two digits into the last group, but unless "two digits" aren't legal values for groups 1 and 2, this will not be correct. In any case it would make the expression unwieldy:
^(?:((?!\d\d$).*?)(?: - |$))?(?:((?!\d\d$).*?)(?:, |$))?(\d\d$)?$
Breakdown:
^ # string starts (?:(.*?)(?: - |$))? # any text, reluctantly, and " - " or the string ends (?:(.*?)(?:, |$))? # any text, reluctantly, and ", " or the string ends (\d\d$)? # two digits and the string ends $ # string ends
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With