Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to determine if line contains 1-4 specific characters

Tags:

regex

Not sure the best way to word this so I'll just give an example. Say I have characters a,b,c,d. I want to accept any string that has 0 or 1 of each character in any order. Strings such as "ab" "abcd" "dcab" would all be acceptable. Is it possible to do this just with regular expressions?

The only thing I have come up with is ((a|b|c|d){0,1}){0,4}. However this wouldn't work as it would accept strings such as "aaaa".

like image 562
segfault Avatar asked Feb 11 '23 23:02

segfault


1 Answers

Question seems to be very easy. But it's really not. Here you go,

^([abcd])(?:(?!\1)([abcd]))?(?:(?!\1|\2)([abcd]))?(?:(?!\1|\2|\3)([abcd]))?$

DEMO

Reduced one,

^([abcd])((?!\1)[abcd])?((?!\1|\2)[abcd])?((?!\1|\2|\3)[abcd])?$

DEMO

Pattern Explanation:

  • ^ Asserts that we are at the start.
  • ([abcd]) First character must be any from the character class(a or b or c or d). And the first character is captured through capturing group.
  • (?!\1)[abcd] Second character must be any character from the character class but it must not be similar to first character. And this character is captured ((?!\1)[abcd])? and we make the whole as optional. If the second character is present then it must satisfy the above condition.
  • ((?!\1|\2)[abcd])? Any character from the char class but not of first or second character. This character is captured and we make this one as optional.
  • ((?!\1|\2|\3)[abcd])? Any character from the char class but not of first or second or third character. This character is captured and we make this one as optional.
  • $ Asserts that we are at the end.

OR

^(?:(?!(.).*\1)[abcd])+$

Pattern Explanation:

  • (?!(.).*\1) Negative lookahead asserts that the characters won't be repeated.
  • (?:(?!(.).*\1)[abcd])+ Now this would match one or more characters from the character class (a or b or c or d) only if there are no repetition of characters. So it would match upto four characters only(1 to 4).

DEMO

OR

Through PCRE verb (*SKIP)(*F),

^.*(.).*\1.*$(*SKIP)(*F)|^[abcd]+$

Pattern Explanation:

  • ^.*(.).*\1.*$ Matches all the lines which has repeated characters.
  • (*SKIP)(*F) Makes the previous match to fail. That is, the regex matching marker would be on all the lines except the one which has repeated characters. Now it tries to match the pattern which is on the right side of | operator over the lines which don't have any repeated characters.
  • ^ Asserts that we are at the start.
  • [abcd]+ Any character from the character class one or more times. Because we already skipped all the lines which has repeated characters, it won't match aba or bba , etc.

  • $ Asserts that we are at the end.

DEMO

like image 149
Avinash Raj Avatar answered Feb 15 '23 11:02

Avinash Raj