Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: individually optional capture groups, but collectively at least one must exist

Tags:

regex

I've implemented this with multiple regexes, but I'm interested to know if it's possible to do it with one.

I've got some strings representing a duration. Those strings have a format exemplified by "4d10h30m", representing a duration of four days, ten hours and thirty minutes. Each unit in the duration (days, hours or minutes) is optional, so "4d" is a valid string, as is "10h30m".

What I would like is a regex (javascript, if it matters) that reliably returns three capture groups, each containing the value of a unit. So in the "4d10h30m" example, matching the regex against this string should return ["4", "10", "30"]. If that unit is missing, its place in the tuple can contain pretty much anything that isn't a nonzero integer (0, "0", null, or an empty string are all fine).

The two approaches I've considered are as follows:

/(?:(\d+)d)?(?:(\d+)h)?(?:(\d+)m)?/

which matches the empty string; and some variation of:

/((?:\d+[dhm]){1,3})/

which makes it awkward to capture just the \d+ and will return an uncertain number of capture groups.

I suspect the latter is a non-starter. The former would work if there were a regex construct that specifies "any of these groups are individually optional, but collectively, at least one of them must be present"? This seems doable under the restrictions of cellular automata, but I don't know how it would be implemented in a regex, or even if it can be.

EDIT:

By request, some example inputs and their outputs:

2d1h5m # ["2","1","5"]
3h20m  # ["", "3", "20"]
4d10m  # ["4", "", "10"]
2d     # ["2", "", ""]
6h     # ["", "6", ""]
1x20y  # no match (invalid units)
2dh20m # no match (no units allowed without a value)
21020  # no match (no units)
1h2d5m # no match (disordered units)
xd5m   # no match (non-numeral value)
like image 582
R Hill Avatar asked Jun 30 '14 15:06

R Hill


1 Answers

Add an anchored negative look-ahead to your regex to assert that there's some input:

^(?!$)(?:(\d+)d)?(?:(\d+)h)?(?:(\d+)m)?$

The expression (?!$) means "this point must not be followed by end of input", and when anchored to the start of input ^(?!$) means "the start can't be followed by the end" which is the same as saying "there must be some input".

Using an anchored look ahead is a handy way to assert the overall length of input for regexes that otherwise assert the format of the input.

See a live demo of this regex with your sample input including blank input showing capture of the units in the correct groups, and not matching the blank input.

like image 111
Bohemian Avatar answered Sep 29 '22 13:09

Bohemian