Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace whitespace outside quotes using regular expression

Using C#, I need to prepare a search text for searching in a SQL Server database using the LIKE command by replacing all whitespace outside quotes with a % character. Example:

Input:

my "search text"

Output:

%my%search text%

Any help would be appreciated. I can handle input strings with an odd number of quotes before replacing the text.

like image 305
Andreas Avatar asked May 24 '11 14:05

Andreas


2 Answers

Instead of using a RegEx, use a simple state machine - loop over each character, noting whether you are "in" or "out" of quotes and only replace spaces when you are in the "out" state.

like image 110
Oded Avatar answered Oct 11 '22 23:10

Oded


If you have to use a regex, you can do it if you are sure that all quotes are correctly balanced, and if there are no escaped quotes (\") in the string (it is possible to account for those, too, but it makes the regex even more complicated).

resultString = Regex.Replace(subjectString, 
    @"[\ ]       # Match a space (brackets for legibility)
    (?=          # Assert that the string after the current position matches...
     [^""]*      # any non-quote characters
     (?:         # followed by...
      ""[^""]*   # one quote, followed by 0+ non-quotes
      ""[^""]*   # a second quote and 0+ non-quotes
     )*          # any number of times, ensuring an even number of quotes
    $            # until the end of the string
    )            # End of lookahead", 
    "%", RegexOptions.IgnorePatternWhitespace);

This examines the remainder of the string to assert an even number of quotes after the current space character. The advantage of lookahead (thanks Alan Moore!) is that it's more portable than lookbehind (most regex flavors except .NET and a few others don't support indefinite repetition inside lookbehind assertions). It may also well be faster.

The original solution involving lookbehind is as follows:

resultString = Regex.Replace(subjectString, 
    @"(?<=       # Assert that the string up to the current position matches...
    ^            # from the start of the string
     [^""]*      # any non-quote characters
     (?:         # followed by...
      ""[^""]*   # one quote, followed by 0+ non-quotes
      ""[^""]*   # a second quote and 0+ non-quotes
     )*          # any number of times, ensuring an even number of quotes
    )            # End of lookbehind
    [ ]          # Match a space (brackets for legibility)", 
    "%", RegexOptions.IgnorePatternWhitespace);
like image 26
Tim Pietzcker Avatar answered Oct 12 '22 00:10

Tim Pietzcker