Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to outperform this regex replacement?

After considerable measurement, I have identified a hotspot in one of our windows services that I'd like to optimize. We are processing strings that may have multiple consecutive spaces in it, and we'd like to reduce to only single spaces. We use a static compiled regex for this task:

private static readonly Regex 
    regex_select_all_multiple_whitespace_chars = 
        new Regex(@"\s+",RegexOptions.Compiled);

and then use it as follows:

var cleanString=
    regex_select_all_multiple_whitespace_chars.Replace(dirtyString.Trim(), " ");

This line is being invoked several million times, and is proving to be fairly intensive. I've tried to write something better, but I'm stumped. Given the fairly modest processing requirements of the regex, surely there's something faster. Could unsafe processing with pointers speed things further?

Edit:

Thanks for the amazing set of responses to this question... most unexpected!

like image 485
spender Avatar asked Apr 27 '10 10:04

spender


People also ask

What is $1 in regex replace?

For example, the replacement pattern $1 indicates that the matched substring is to be replaced by the first captured group.

How do I make regular expressions faster?

Expose Literal Characters Regex engines match fastest when anchors and literal characters are right there in the main pattern, rather than buried in sub-expressions. Hence the advice to "expose" literal characters whenever you can take them out of an alternation or quantified expression. Let's look at two examples.

Is regex faster than string replace?

String operations will always be faster than regular expression operations. Unless, of course, you write the string operations in an inefficient way. Regular expressions have to be parsed, and code generated to perform the operation using string operations.


1 Answers

This is about three times faster:

private static string RemoveDuplicateSpaces(string text) {
  StringBuilder b = new StringBuilder(text.Length);
  bool space = false;
  foreach (char c in text) {
    if (c == ' ') {
      if (!space) b.Append(c);
      space = true;
    } else {
      b.Append(c);
      space = false;
    }
  }
  return b.ToString();
}
like image 194
Guffa Avatar answered Sep 19 '22 21:09

Guffa