Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alphabetic order regex using backreferences

I recently came across a puzzle to find a regular expression that matches:

5-character-long strings comprised of lowercase English letters in ascending ASCII order

Valid examples include:

aaaaa
abcde
xxyyz
ghost
chips
demos

Invalid examples include:

abCde
xxyyzz
hgost
chps

My current solution is kludgy. I use the regex:

(?=^[a-z]{5}$)^(a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*)$

which uses a non-consuming capture group to assert a string length of 5, and then verifies that the string comprises of lowercase English letters in order (see Rubular).

Instead, I'd like to use back references inside character classes. Something like:

^([a-z])([\1-z])([\2-z])([\3-z])([\4-z])$

The logic for the solution (see Rubular) in my head is to capture the first character [a-z], use it as a backrefence in the second character class and so on. However, \1, \2 ... within character classes seem to refer to ASCII values of 1, 2... effectively matching any four- or five-character string.

I have 2 questions:

  1. Can I use back references in my character classes to check for ascending order strings?
  2. Is there any less-hacky solution to this puzzle?
like image 521
Jedi Avatar asked Jun 30 '17 14:06

Jedi


People also ask

What does this mean in regex ([ ])\ 1?

The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text that was matched by the first capturing group. The / before it is a literal character. It is simply the forward slash in the closing HTML tag that we are trying to match.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

What is a ZA Z in regex?

For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.


1 Answers

I'm posting this answer more as a comment than an answer since it has better formatting than comments.

Related to your questions:

  1. Can I use back references in my character classes to check for ascending order strings?

No, you can't. If you take a look a backref regular-expressions section, you will find below documentation:

Parentheses and Backreferences Cannot Be Used Inside Character Classes

Parentheses cannot be used inside character classes, at least not as metacharacters. When you put a parenthesis in a character class, it is treated as a literal character. So the regex [(a)b] matches a, b, (, and ).

Backreferences, too, cannot be used inside a character class. The \1 in a regex like (a)[\1b] is either an error or a needlessly escaped literal 1. In JavaScript it's an octal escape.

Regarding your 2nd question:

  1. Is there any less-hacky solution to this puzzle?

Imho, your regex is perfectly well, you could shorten it very little at the beginning like this:

(?=^.{5}$)^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$
    ^--- Here

Regex demo

like image 104
Federico Piazza Avatar answered Oct 07 '22 22:10

Federico Piazza