Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: Match specific characters in any order without more occurrences of each character than specified

Tags:

regex

I have a list of characters, e.g. {o, b, c, c, d, o, f}.

If a string contains characters that are not in that list, I don't want it to be a match. If a string contains more occurrences of a character than there are occurrences of that character in that list, I don't want it to be a match.

The characters in the string may occur in any order, and all characters don't have to appear. In the above example "foo" should be a match but not "fooo".

I have for instance narrowed the above example down to (o{0,2}b?c{0,2}d?f?), but that doesn't quite work since the order matters in that regex. I get a match for "oof" but not for "foo".

like image 841
vladakolic Avatar asked Mar 14 '14 17:03

vladakolic


People also ask

What does ?= * Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What does \+ mean in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.

What is the regular expression matching one or more specific characters?

The character + in a regular expression means "match the preceding character one or more times". For example A+ matches one or more of character A. The plus character, used in a regular expression, is called a Kleene plus .


1 Answers

As gview says, regex is not the right tool. However, if your regex engine supports lookahead, you can use this:

(?=(?:[^o]*o){0,2}[^o]*$)(?=(?:[^c]*c){0,2}[^c]*$)(?=[^b]*b?[^b]*$)(?=[^d]*d?[^d*]*$)(?=[^f]*f?[^f]*$)^[obcdf]+$

Its a bit long but very simple:

The string is matched with ^[obcdf]+$ (note the use of anchors).

The lookaheads (?=...) are only checks (followed by):

(?=(?:[^o]*o){0,2}[^o]*$)   # no more than 2 o until the end

(?=[^b]*b?[^b]*$) # no more than 1 b until the end

Each subpattern in lookaheads describes the whole string.

like image 177
Casimir et Hippolyte Avatar answered Sep 22 '22 02:09

Casimir et Hippolyte