Is it possible to write a regular expression which will match any subset of a given set of charactersa1 ... an
?
I.e. it should match any string where any of these characters appears at most once, there are no other characters and the relative order of the characters doesn't matter.
Some approaches that arise at once:
1. [a1,...,an]*
or (a1|a2|...|an)*
- this allows multiple presence of characters
2. (a1?a2?...an?)
- no multiple presence, but relative order is important - this matches any subsequence but not subset.
3. ($|a1|...|an|a1a2|a2a1|...|a1...an|...|an...a1)
, i.e. write all possible subsequences (just hardcode all matching strings :)) of course, not acceptable.
I also have a guess that it may be theoretically impossible, because during parsing the string we will need to remember which character we have already met before, and as far as I know regular expressions can check out only right-linear languages.
Any help will be appreciated. Thanks in advance.
This doesn't really qualify for the language-agnostic
tag, but...
^(?:(?!\1)a1()|(?!\2)a2()|...|(?!\n)an())*$
see a demo on ideone.com
The first time an element is matched, it gets "checked off" by the capturing group following it. Because the group has now participated in the match, a negative lookahead for its corresponding backreference (e.g., (?!\1)
) will never match again, even though the group only captured an empty string. This is an undocumented feature that is nevertheless supported in many flavors, including Java, .NET, Perl, Python, and Ruby.
This solution also requires support for forward references (i.e., a reference to a given capturing group (\1
) appearing in the regex before the group itself). This seems to be a little less widely supported than the empty-groups gimmick.
Can't think how to do it with a single regex, but this is one way to do it with n
regexes: (I will usr 1
2
... m
n
etc for your a
s)
^[23..n]*1?[23..n]*$
^[13..n]*2?[13..n]*$
...
^[12..m]*n?[12..m]*$
If all the above match, your string is a strict subset of 12..mn
.
How this works: each line requires the string to consist exactly of:
a particular one
a particular one
a particular one
If this passes when every element in turn is considered as a particular one
, we know:
as required.
for completeness I should say that I would only do this if I was under orders to "use regex"; if not, I'd track which allowed elements have been seen, and iterate over the characters of the string doing the obvious thing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With