Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for at least two unique characters

Tags:

.net

regex

I need a regular expression to validate a password.

You can assume that the input contains only lower case letters, a-z. The restriction is that there must be at least two letters which are unique.

Note that I do mean unique charactors; not just two different charactors. (If that makes sense?)

For example these are ok:

abc
abbc
aabc

These should fail:

aabb    //There are no unique letters.  The 'a' appears twice.
aabbcc  //There are no unique letters
abab    //There are no unique letters
abb     //There is only one unique letter

I know that just looping through the letters would be a much easier solution, but unfortunately I need this as a regex.

I have been trying various combinations of lookaheads etc. but no luck so far.

EDIT:

I have made a little progress. I can now check for a single unique letter, using both a negative look behind and also a negative lookahead. Like this:

(.)(?<!\1.+)(?!.*\1)

I expected I could just put this twice but it's not working. Something similar to:

(.)(?<!\1.+)(?!.*\1)(.)(?<!\2.+)(?!.*\2)
like image 710
Buh Buh Avatar asked Mar 17 '11 15:03

Buh Buh


People also ask

What does \+ mean in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

What is regex AZ match?

The regular expression [A-Z][a-z]* matches any sequence of letters that starts with an uppercase letter and is followed by zero or more lowercase letters.


1 Answers

This seems to do the trick:

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string[] values = { "abc", "abbc", "aabc", "aabb", "aabbcc", "abab", "abb" };
      string pattern = @"(?:(.)(?<=^(?:(?!\1).)*\1)(?=(?:(?!\1).)*$).*?){2,}";
      foreach (string value in values) {
         if (Regex.IsMatch(value, pattern)) {
            Console.WriteLine("{0} valid", value);
         }
         else {   
            Console.WriteLine("{0} invalid", value);
         }
      }
   }
}

produces the output:

abc valid
abbc valid
aabc valid
aabb invalid
aabbcc invalid
abab invalid
abb invalid

as can be seen on Ideone: http://ideone.com/oU7a0

But the regex is a horrific thing!

I'll explain it later, if you want (I have to go now).


EDIT

Okay, here's an explanation of this monstorisity (I hope!):

(?:                # start non-capture group 1
  (.)              #   capture any character, and store it in match group 1
  (?<=             #   start posisitve look-behind
    ^              #     match the start of the input string
    (?:(?!\1).)    #     if what is captured in match group 1 cannot be seen ahead, match the character
    *              #     repeat the previous zero or more times 
    \1             #     this is the `(.)` we're looking at
  )                #   end posisitve look-behind
  (?=              #   start posisitve look-ahead
    (?:(?!\1).)    #     if what is captured in match group 1 cannot be seen ahead, match the character
    *              #     repeat the previous zero or more times 
    $              #     match the end of the input string
  )                #   emd posisitve look-ahead
  .*?              #   match zero or more characters, un-greedy
)                  # end non-capture group 1
{2,}               # match non-capture group 1 at least 2 times

In plain English, it would be a bit like this:

+---                                                      # (
| match and group any character `C` at position `P`,      # (.)
|                                                         #
| and look from the start of the input all that way       #
| to `P` where there can't be any character like `C`      # (?<=^(?:(?!\1).)*\1)
| in between.                                             #
|                                                         #
| Also look from position `P` all the way to the end      #
| of the input where there can't be any character `C`     # (?=(?:(?!\1).)*$)
| in bewteen.                                             #
+---                                                      #
| if the previous isn't matched, consume any character    #
| un-greedy zero or more times (but the previous block    # .*?
| is always tried before this part matched the character) #
+---                                                      # )
|                                                         #
|                                                         # 
+----> repeat this at least 2 times                       # {2,}

EDIT II

Let's say Kobi (K) is placed somewhere on top of a string, "abcYbacZa", containing 9 characters:

              K
+---+---+---+---+---+---+---+---+---+ 
| a | b | c | Y | b | a | c | Z | a |
+---+---+---+---+---+---+---+---+---+

^   ^   ^   ^   ^   ^   ^   ^   ^
|   |   |   |   |   |   |   |   |
p0  p1  p2  p3  p4  p5  p6  p7  p8

and Kobi wants to know if the character at index 4, Y, is unique in the entire string. Kobi travels with his trusted minion, let's call his minion Bart (B), who get's the following assignment from Kobi:

Step 1

Kobi: Bart, go back to the start of the input: regex: (?<=^ ... ) (Bart will start at position 0: p0, which is the empty string just before the first a);

B             K
+---+---+---+---+---+---+---+---+---+ 
| a | b | c | Y | b | a | c | Z | a |
+---+---+---+---+---+---+---+---+---+

^   ^   ^   ^   ^   ^   ^   ^   ^   ^
|   |   |   |   |   |   |   |   |   |
p0  p1  p2  p3  p4  p5  p6  p7  p8  p9

Step 2

Then look ahead and determine if you can't see the character I've remembered in match group 1, regex: (.), which is the character Y. So at position p0, Bart performs regex: (?!\1). For p0, this holds true: Bart sees the character a, so Y is still unique. Bart advances to the next position p1, where he sees character b: all is still fine, and he makes another step to position p2, and so on: regex: (?:(?!\1).)*.

Step 3

Bart is now at position p3:

            B K
+---+---+---+---+---+---+---+---+---+ 
| a | b | c | Y | b | a | c | Z | a |
+---+---+---+---+---+---+---+---+---+

^   ^   ^   ^   ^   ^   ^   ^   ^   ^
|   |   |   |   |   |   |   |   |   |
p0  p1  p2  p3  p4  p5  p6  p7  p8  p9

and when he now looks ahead, he does see the character Y, of course, so regex: (?!\1) fails. But that character, where Kobi is still on, is consumed by the last \1 in: regex: (?:(?!\1).)*\1. So after p3, Bart proudly tells Kobi: "Yes, Y is indeed unique while looking behind us!". "Good", says Kobi, "now do the same, but instead of looking behind, look ahead, all the way up to the end of this string we're standing on, and make it snappy!".

Step 4

Bart grumbles something unintelligible, but starts his journey at p4:

              K B
+---+---+---+---+---+---+---+---+---+ 
| a | b | c | Y | b | a | c | Z | a |
+---+---+---+---+---+---+---+---+---+

^   ^   ^   ^   ^   ^   ^   ^   ^   ^
|   |   |   |   |   |   |   |   |   |
p0  p1  p2  p3  p4  p5  p6  p7  p8  p9

and when he looks ahead, he sees the character b, so the regex: (?!\1) holds true and the character b is consumed by regex: .. Bart repeats this zero or more times regex: (?:(?!\1).)* all the way to the end of the input regex: (?:(?!\1).)*$. He returns to Kobi again, and tells him: "Yes, when looking ahead, Y is still unique!"

So, the regex:

(.)(?<=^(?:(?!\1).)*\1)(?=(?:(?!\1).)*$)

will match any single character that is unique in the string.

Step 5

The regex above can't simply be repeated 2 or more times, since that would only match 2 successive unique characters. So, we'll add a regex: .*? after it:

(.)(?<=^(?:(?!\1).)*\1)(?=(?:(?!\1).)*$).*?
                                        ^^^

that will consume the characters b, a and c in this case, and repeat that regex two times or more:

(?:(.)(?<=^(?:(?!\1).)*\1)(?=(?:(?!\1).)*$).*?){2,}
^^^                                           ^^^^^ 

So at the end, the substring "YbacZ" is matched from "abcYbacZa" because Y and Z are unique.

There you have it, as easy as pie, right? :)

like image 166
Bart Kiers Avatar answered Nov 15 '22 10:11

Bart Kiers