I recently came across a puzzle to find a regular expression that matches:
5-character-long strings comprised of lowercase English letters in ascending ASCII order
Valid examples include:
aaaaa
abcde
xxyyz
ghost
chips
demos
Invalid examples include:
abCde
xxyyzz
hgost
chps
My current solution is kludgy. I use the regex:
(?=^[a-z]{5}$)^(a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*)$
which uses a non-consuming capture group to assert a string length of 5, and then verifies that the string comprises of lowercase English letters in order (see Rubular).
Instead, I'd like to use back references inside character classes. Something like:
^([a-z])([\1-z])([\2-z])([\3-z])([\4-z])$
The logic for the solution (see Rubular) in my head is to capture the first character [a-z], use it as a backrefence in the second character class and so on. However, \1
, \2
... within character classes seem to refer to ASCII values of 1, 2... effectively matching any four- or five-character string.
I have 2 questions:
The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text that was matched by the first capturing group. The / before it is a literal character. It is simply the forward slash in the closing HTML tag that we are trying to match.
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.
For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.
I'm posting this answer more as a comment than an answer since it has better formatting than comments.
Related to your questions:
- Can I use back references in my character classes to check for ascending order strings?
No, you can't. If you take a look a backref regular-expressions section, you will find below documentation:
Parentheses and Backreferences Cannot Be Used Inside Character Classes
Parentheses cannot be used inside character classes, at least not as metacharacters. When you put a parenthesis in a character class, it is treated as a literal character. So the regex [(a)b] matches a, b, (, and ).
Backreferences, too, cannot be used inside a character class. The \1 in a regex like (a)[\1b] is either an error or a needlessly escaped literal 1. In JavaScript it's an octal escape.
Regarding your 2nd question:
- Is there any less-hacky solution to this puzzle?
Imho, your regex is perfectly well, you could shorten it very little at the beginning like this:
(?=^.{5}$)^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$
^--- Here
Regex demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With