Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to check non-repetition of a set of characters

Suppose I have the set of characters [ABC]. I'm looking for a regex that would match any permutation of the superset except the empty set, i.e.

ABC ACB BAC BCA CAB CBA
AB BC AC CB CA BA
A B C

The regex should (obviously) not match the empty string.

p.s. An alternative way to express the same objective is "match any non-empty string containing each character in the set at most once".

update: The set [ABC] is just an example, for the real set may also be bigger. With this question I was hoping to find a "general" solution rather than a particular one for [ABC].

like image 755
CAFxX Avatar asked Apr 26 '12 11:04

CAFxX


5 Answers

I believe this can be solved by regular expressions. Use this regex:

/^([ABC])(?!\1)([ABC])?(?!\1|\2)[ABC]?$/

Let me know if you need an online demo on this.

like image 94
anubhava Avatar answered Nov 17 '22 11:11

anubhava


Thanks to your answers (especially anubhava's and codaddict's) I was able to find this solution, that I think is pretty elegant because it allows to type the set only once:

\b(([ABC])(?!.*\2))+\b

the \b are needed to match full words; omitting them will also find subwords respecting the required property. To match a full string, you'd obviously do:

^(([ABC])(?!.*\2))+$
like image 41
CAFxX Avatar answered Nov 17 '22 13:11

CAFxX


Try:

([ABC]?)(?!.*\1)([ABC]?)(?!.*\2)[ABC]?

It is just [ABC]? repeated 3 times with the added check of the negative lookahead assertion that does not permit duplicate characters.

Note that this would work only if the input set is all unique.

See it work

like image 1
codaddict Avatar answered Nov 17 '22 13:11

codaddict


"A((B?C?)|(C?B?))|B((A?C?)|(C?A?))|C((A?B?)|(B?A?))"

It is A|B|C and each of them can be followed by an pair of optional values

 A(B?C?) matches A, AB,AC and ABC
 A(C?B?) matches A, AC,AB and ACB 

but not ACAC, AA or ACC. The cases with B or C as first character are equivalent.

For longer Strings, this will get ugly soon. A better approach would be (pseudocode):

 string.sort().matches ("^A?B?C?$") && string.length > 0
like image 1
user unknown Avatar answered Nov 17 '22 12:11

user unknown


This is not something that regular expressions are good at. You might just want to create a list of permutations instead, and then produce all unique substrings.

something like:

def matches(s, characters):
    if len(s) != len(set(s)):
        return False # not unique sequence of characters
    return set(s).issubsetof(set(characters))
like image 1
Daren Thomas Avatar answered Nov 17 '22 13:11

Daren Thomas