Regex for at least two unique characters

Tags:

1 Answers

This seems to do the trick:

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string[] values = { "abc", "abbc", "aabc", "aabb", "aabbcc", "abab", "abb" };
      string pattern = @"(?:(.)(?<=^(?:(?!\1).)*\1)(?=(?:(?!\1).)*$).*?){2,}";
      foreach (string value in values) {
         if (Regex.IsMatch(value, pattern)) {
            Console.WriteLine("{0} valid", value);
         }
         else {   
            Console.WriteLine("{0} invalid", value);
         }
      }
   }
}

produces the output:

abc valid
abbc valid
aabc valid
aabb invalid
aabbcc invalid
abab invalid
abb invalid

as can be seen on Ideone: http://ideone.com/oU7a0

But the regex is a horrific thing!

I'll explain it later, if you want (I have to go now).

EDIT

Okay, here's an explanation of this monstorisity (I hope!):

(?:                # start non-capture group 1
  (.)              #   capture any character, and store it in match group 1
  (?<=             #   start posisitve look-behind
    ^              #     match the start of the input string
    (?:(?!\1).)    #     if what is captured in match group 1 cannot be seen ahead, match the character
    *              #     repeat the previous zero or more times 
    \1             #     this is the `(.)` we're looking at
  )                #   end posisitve look-behind
  (?=              #   start posisitve look-ahead
    (?:(?!\1).)    #     if what is captured in match group 1 cannot be seen ahead, match the character
    *              #     repeat the previous zero or more times 
    $              #     match the end of the input string
  )                #   emd posisitve look-ahead
  .*?              #   match zero or more characters, un-greedy
)                  # end non-capture group 1
{2,}               # match non-capture group 1 at least 2 times

In plain English, it would be a bit like this:

+---                                                      # (
| match and group any character `C` at position `P`,      # (.)
|                                                         #
| and look from the start of the input all that way       #
| to `P` where there can't be any character like `C`      # (?<=^(?:(?!\1).)*\1)
| in between.                                             #
|                                                         #
| Also look from position `P` all the way to the end      #
| of the input where there can't be any character `C`     # (?=(?:(?!\1).)*$)
| in bewteen.                                             #
+---                                                      #
| if the previous isn't matched, consume any character    #
| un-greedy zero or more times (but the previous block    # .*?
| is always tried before this part matched the character) #
+---                                                      # )
|                                                         #
|                                                         # 
+----> repeat this at least 2 times                       # {2,}

EDIT II

Let's say Kobi (K) is placed somewhere on top of a string, "abcYbacZa", containing 9 characters:

              K
+---+---+---+---+---+---+---+---+---+ 
| a | b | c | Y | b | a | c | Z | a |
+---+---+---+---+---+---+---+---+---+

^   ^   ^   ^   ^   ^   ^   ^   ^
|   |   |   |   |   |   |   |   |
p0  p1  p2  p3  p4  p5  p6  p7  p8

and Kobi wants to know if the character at index 4, Y, is unique in the entire string. Kobi travels with his trusted minion, let's call his minion Bart (B), who get's the following assignment from Kobi:

Step 1

Kobi: Bart, go back to the start of the input: regex: (?<=^ ... ) (Bart will start at position 0: p0, which is the empty string just before the first a);

B             K
+---+---+---+---+---+---+---+---+---+ 
| a | b | c | Y | b | a | c | Z | a |
+---+---+---+---+---+---+---+---+---+

^   ^   ^   ^   ^   ^   ^   ^   ^   ^
|   |   |   |   |   |   |   |   |   |
p0  p1  p2  p3  p4  p5  p6  p7  p8  p9

Step 2

Then look ahead and determine if you can't see the character I've remembered in match group 1, regex: (.), which is the character Y. So at position p0, Bart performs regex: (?!\1). For p0, this holds true: Bart sees the character a, so Y is still unique. Bart advances to the next position p1, where he sees character b: all is still fine, and he makes another step to position p2, and so on: regex: (?:(?!\1).)*.

Step 3

Bart is now at position p3:

            B K
+---+---+---+---+---+---+---+---+---+ 
| a | b | c | Y | b | a | c | Z | a |
+---+---+---+---+---+---+---+---+---+

^   ^   ^   ^   ^   ^   ^   ^   ^   ^
|   |   |   |   |   |   |   |   |   |
p0  p1  p2  p3  p4  p5  p6  p7  p8  p9

and when he now looks ahead, he does see the character Y, of course, so regex: (?!\1) fails. But that character, where Kobi is still on, is consumed by the last \1 in: regex: (?:(?!\1).)*\1. So after p3, Bart proudly tells Kobi: "Yes, Y is indeed unique while looking behind us!". "Good", says Kobi, "now do the same, but instead of looking behind, look ahead, all the way up to the end of this string we're standing on, and make it snappy!".

Step 4

Bart grumbles something unintelligible, but starts his journey at p4:

              K B
+---+---+---+---+---+---+---+---+---+ 
| a | b | c | Y | b | a | c | Z | a |
+---+---+---+---+---+---+---+---+---+

^   ^   ^   ^   ^   ^   ^   ^   ^   ^
|   |   |   |   |   |   |   |   |   |
p0  p1  p2  p3  p4  p5  p6  p7  p8  p9

and when he looks ahead, he sees the character b, so the regex: (?!\1) holds true and the character b is consumed by regex: .. Bart repeats this zero or more times regex: (?:(?!\1).)* all the way to the end of the input regex: (?:(?!\1).)*$. He returns to Kobi again, and tells him: "Yes, when looking ahead, Y is still unique!"

So, the regex:

(.)(?<=^(?:(?!\1).)*\1)(?=(?:(?!\1).)*$)

will match any single character that is unique in the string.

Step 5

The regex above can't simply be repeated 2 or more times, since that would only match 2 successive unique characters. So, we'll add a regex: .*? after it:

(.)(?<=^(?:(?!\1).)*\1)(?=(?:(?!\1).)*$).*?
                                        ^^^

that will consume the characters b, a and c in this case, and repeat that regex two times or more:

(?:(.)(?<=^(?:(?!\1).)*\1)(?=(?:(?!\1).)*$).*?){2,}
^^^                                           ^^^^^

So at the end, the substring "YbacZ" is matched from "abcYbacZa" because Y and Z are unique.

There you have it, as easy as pie, right? :)

166

answered Nov 15 '22 10:11

Bart Kiers

Related questions
                            
                                Avoiding web service god classes
                            
                                General Rule for When to Implement IDisposable
                            
                                How to fetch all hits in lucene.net
                            
                                Are there any .NET ESBs?
                            
                                How to return a generic list collection in C#?
                            
                                Using MS Access as a .NET Application Backend [closed]
                            
                                usage of 'using' in .NET
                            
                                Why Microsoft is still sticking to COM technology
                            
                                How to serialize Nullable<bool>?
                            
                                Boxing & Unboxing [duplicate]
                            
                                Confirming all Keys in a dictionary have populated Values
                            
                                Serialization and versioning
                            
                                Window ActualTop, ActualLeft
                            
                                What is the best way to store area data for a text adventure?
                            
                                Is there any safe refactoring tool for .net (or at least c#)?
                            
                                Differences between Proxy pattern and Adapter pattern?
                            
                                Append 'List' items to StringBuilder
                            
                                Creating a DataTable by filtering another DataTable
                            
                                verifying a list using moq
                            
                                Grid Table in WPF

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With