Is it possible to construct a PCRE-style regular expression that will only match each letter in a list only once?
For example, if you have the letters "lrsa" and you try matching a word list against:
^[lrsa]*m[lrsa]*$
you're going to match "lams" (valid), but also "lamas" (invalid for our purposes because you only had one "a"). If your letter set was "lrsaa", you would want to match "lamas".
Is this possible with regular expressions, or should I handle it programmatically?
is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.
The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).
For examples, \+ matches "+" ; \[ matches "[" ; and \. matches "." . Regex also recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.
You can use negative look-ahead:
^(?!.*?(.).*?\1)[lrsa]*m[lrsa]*$
will do what you want
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With