Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.NET regex matching

Tags:

c#

regex

matching

Broadly: how do I match a word with regex rules for a)the beginning, b)the whole word, and c)the end?

More specifically: How do I match an expression of length >= 1 that has the following rules:

  1. It cannot have any of: ! @ #
  2. It cannot begin with a space or =
  3. It cannot end with a space

I tried:

^[^\s=][^!@#]*[^\s]$

But the ^[^\s=] matching moves past the first character in the word. Hence this also matches words that begin with '!' or '@' or '#' (eg: '#ab' or '@aa'). This also forces the word to have at least 2 characters (one beginning character that is not space or = -and- one non-space character in the end).

I got to:

^[^\s=(!@#)]\1*$

for a regex matching the first two rules. But how do I match no trailing spaces in the word with allowing words of length 1?

like image 764
raj Avatar asked Apr 09 '11 02:04

raj


2 Answers

Cameron's solution is both accurate and efficient (and should be used for any production code where speed needs to be optimized). The answer presented here is less efficient, but demonstrates a general approach for applying logic using regular expressions.

You can use multiple positive and negative lookahead regex assertions (all applied at one location in the target string - typically the beginning), to apply multiple logical constraints for a match. The commented regex below demonstrates how easy this is to do for this example case. You do need to understand how the regex engine actually matches (and doesn't match), to come up with the correct expressions, but its not hard once you get the hang of it.

foundMatch = Regex.IsMatch(subjectString, @"
    # Match 'word' meeting multiple logical constraints.
    ^             # Anchor to start of string.
    (?=[^!@#]*$)  # It cannot have any of: ! @ #,      AND
    (?![ =])      # It cannot begin with a space or =, AND
    (?!.*\S$)     # It cannot end with a space,        AND
    .{1,}         # length >= 1 (ok to match special 'word')
    \z            # Anchor to end of string.
    ", 
    RegexOptions.IgnorePatternWhitespace);

This application of "regex-logic" is frequently used for complex password validation.

like image 131
ridgerunner Avatar answered Oct 05 '22 11:10

ridgerunner


Your first attempt was very close. You only need to exclude more characters for the first and last parts, and make the last two parts optional:

^[^\s=!@#](?:[^!@#]*[^\s!@#])?$

This ensures that all three sections will not include any of !@#. Then, if the word is more than one character long, it will need to end with a not-space, with only select characters filling the space in-between. This is all enforced properly because of the ^ and $ anchors.

I'm not quite sure what your second example matched, since the () should be taken as literal characters when embedded within a character class, not as a capturing group.

like image 39
Cameron Avatar answered Oct 05 '22 11:10

Cameron