Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to upper case a regular expressions pattern?

Tags:

c#

regex

I'm currently looking at a project which highly utilises Regular Expressions. The input strings are already upper cased and so the regex IgnoreCase flag has been set. The internal MS RegEx engine though is then changing all the case back to lower which is an unnecessary hit. Changing the reg expresions pattern to upper case and removing the flag helps the performance.

Does anyone know of a library of algorithm which can upper case the Reg ex patterns without affecting the group names or escaped chars?

like image 893
gouldos Avatar asked May 27 '11 09:05

gouldos


Video Answer


1 Answers

You could go and search for lowercase letters that are not preceded by an uneven number of backslashes:

(?<!(?<!\\)(?:\\\\)*\\)\p{Ll}+

Then pass the match to a MatchEvaluator, uppercase it and replace the text in the original string. I don't know C#, so this might not work right away (code snippet taken and modified a bit from RegexBuddy), but it's a start:

string resultString = null;
resultString = Regex.Replace(subjectString, 
    @"(?<!                 # Negative lookbehind:
       (?<!\\)(?:\\\\)*\\  # Is there no odd number of backslashes
      |                    # nor
       \(\?<?\p{L}*        # (?<tags or (?modifiers
      )                    # before the current position?
      \p{Ll}+              # Then match one or more letters", 
    new MatchEvaluator(ComputeReplacement), RegexOptions.IgnorePatternWhitespace);

public String ComputeReplacement(Match m) {
    // You can vary the replacement text for each match on-the-fly
    return @"\0".ToUpper();  // or whatever is needed for uppercasing in .NET
}

Explanation:

(?<!        # assert that the string before the current position doesn't match:
 (?<!\\)    # assert that we start at the first backslash in the series
 (?:\\\\)*  # match an even number of backslashes
 \\         # match one backslash
)
\p{Ll}+     # now match any sequence of lowercase letters
like image 97
Tim Pietzcker Avatar answered Sep 28 '22 12:09

Tim Pietzcker