Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regexp skip pattern

Tags:

c#

regex

Problem

I need to replace all asterisk symbols('*') with percent symbol('%'). The asterisk symbols in square brackets should be ignored.

Example

    [Test]
    public void Replace_all_asterisks_outside_the_square_brackets()
    {
        var input = "Hel[*o], w*rld!";
        var output = Regex.Replace(input, "What_pattern_should_be_there?", "%")

        Assert.AreEqual("Hel[*o], w%rld!", output));
    }
like image 977
Andrii Startsev Avatar asked Mar 01 '11 11:03

Andrii Startsev


1 Answers

Try using a look ahead:

\*(?![^\[\]]*\])

Here's a bit stronger solution, which takes care of [] blocks better, and even escaped \[ characters:

string text = @"h*H\[el[*o], w*rl\]d!";
string pattern = @"
\\.                 # Match an escaped character. (to skip over it)
|
\[                  # Match a character class 
    (?:\\.|[^\]])*  # which may also contain escaped characters (to skip over it)
\]
|
(?<Asterisk>\*)     # Match `*` and add it to a group.
";

text = Regex.Replace(text, pattern,
    match => match.Groups["Asterisk"].Success ? "%" : match.Value,
    RegexOptions.IgnorePatternWhitespace);

If you don't care about escaped characters you can simplify it to:

\[          # Skip a character class
    [^\]]*  # until the first ']'
\]
|
(?<Asterisk>\*)

Which can be written without comments as: @"\[[^\]]*\]|(?<Asterisk>\*)".

To understand why it works we need to understand how Regex.Replace works: for every position in the string it tries to match the regex. If it fails, it moves one character. If it succeeds, it moves over the whole match.
Here, we have dummy matches for the [...] blocks so we may skip over the asterisks we don't want to replace, and match only the lonely ones. That decision is made in a callback function that checks if Asterisk was matched or not.

like image 80
Kobi Avatar answered Sep 30 '22 13:09

Kobi