I want to match strings in the format of A0123456
, E0123456
, or IN:A0123456Q
, etc. I originally made this regex
^(IN:)?[AE][0-9]{7}Q?$
but it was matching IN:E012346
without the Q
at the end. So I created this regex
(^IN:[AE][0-9]{7}Q$)|(^[AE][0-9]{7}$)
Is there any way to shorten this regex so that it requires both IN:
and Q
if they are present, but not if neither are present?
Edit: The regex will be used in Ruby.
Edit 2: I changed the regex to reflect that I was matching the wrong strings, as it would still match IN:A0123456
.
Edit 3: Both answers below are valid, but since I am using Ruby 2.0
and prefer a regex expression I can use in case I change my application and don't want to use the Ruby flavor of subexpression calls, I chose to accept matt
's answer.
The second regex has problem:
^(IN:[AE][0-9]{7}Q)|([AE][0-9]{7})$
The |
has lower precedence than concatenation, so the regex will be parsed as:
^(IN:[AE][0-9]{7}Q) # Starts with (IN:[AE][0-9]{7}Q)
| # OR
([AE][0-9]{7})$ # Ends with ([AE][0-9]{7})
To fix this problem, just use a non-capturing group:
^(?:(IN:[AE][0-9]{7}Q)|([AE][0-9]{7}))$
It makes sure the input string matches either format, not just starting or ending with certain format (which is clearly incorrect).
Regarding shortening the regex, you may replace [0-9]
with \d
if you want to, but it is fine as it is.
I don't think there is any other way to shorten the regex within the default level of support of Ruby.
Just for your information, in Perl/PCRE, you can shorten it with subroutine call:
^(?:([AE][0-9]{7})|(IN:(?1)Q))$
(?1)
refers to the pattern defined by the first capturing group, i.e. [AE][0-9]{7}
. The regex is effectively the same, just look shorter. This demo with input IN:E0123463Q
shows the whole text being captured by group 2 (and no text captured for group 1).
In Ruby, a similar concept subexpression call exists, with slightly different syntax. Ruby uses \g<name>
or \g<number>
to refer to the capturing group whose pattern we want to reuse:
^(?:([AE][0-9]{7})|(IN:\g<1>Q))$
The test case here on rubular under Ruby 1.9.7, for input IN:E0123463Q
, returns E0123463
as match for group 1 and IN:E0123463Q
as match for group 2.
Ruby's (1.9.7) implementation seems to record the captured text for group 1 even when group 1 is not directly involved in the matching. In PCRE, subroutine calls does not capture text.
There is also conditional regex that allows you to check whether a certain capturing group matches something or not. You can check matt's answer for more information.
If you are using Ruby 2.0, you can use an if-then-else conditional match (undocumented in the Ruby docs, but does exist):
/^(IN:)?[AE][0-9]{7}(?(1)Q|)$/
The conditional part is (?(1)Q|)
which says if group number 1 matched, then match Q
, else match nothing. Since group number 1 is (IN:)
, this achieves what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With