I'm currently looking at a project which highly utilises Regular Expressions. The input strings are already upper cased and so the regex IgnoreCase flag has been set. The internal MS RegEx engine though is then changing all the case back to lower which is an unnecessary hit. Changing the reg expresions pattern to upper case and removing the flag helps the performance.
Does anyone know of a library of algorithm which can upper case the Reg ex patterns without affecting the group names or escaped chars?
You could go and search for lowercase letters that are not preceded by an uneven number of backslashes:
(?<!(?<!\\)(?:\\\\)*\\)\p{Ll}+
Then pass the match to a MatchEvaluator
, uppercase it and replace the text in the original string. I don't know C#, so this might not work right away (code snippet taken and modified a bit from RegexBuddy), but it's a start:
string resultString = null;
resultString = Regex.Replace(subjectString,
@"(?<! # Negative lookbehind:
(?<!\\)(?:\\\\)*\\ # Is there no odd number of backslashes
| # nor
\(\?<?\p{L}* # (?<tags or (?modifiers
) # before the current position?
\p{Ll}+ # Then match one or more letters",
new MatchEvaluator(ComputeReplacement), RegexOptions.IgnorePatternWhitespace);
public String ComputeReplacement(Match m) {
// You can vary the replacement text for each match on-the-fly
return @"\0".ToUpper(); // or whatever is needed for uppercasing in .NET
}
Explanation:
(?<! # assert that the string before the current position doesn't match:
(?<!\\) # assert that we start at the first backslash in the series
(?:\\\\)* # match an even number of backslashes
\\ # match one backslash
)
\p{Ll}+ # now match any sequence of lowercase letters
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With