Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to encode strings for Regular Expression in .NET?

Tags:

c#

regex

encoding

I need to dynamically build a Regex to catch the given keywords, like

string regex = "(some|predefined|words";
foreach (Product product in products)
    regex += "|" + product.Name; // Need to encode product.Name because it can include special characters.
regex += ")";

Is there some kind of Regex.Encode that does this?

like image 997
randomguy Avatar asked Jul 25 '10 18:07

randomguy


People also ask

Can regex be a string?

Regex examples. A simple example for a regular expression is a (literal) string. For example, the Hello World regex matches the "Hello World" string. . (dot) is another example for a regular expression.

What encoding does C# use for strings?

This means that a single char ( System. Char ) cannot cover every character. This leads to the use of surrogates where characters above U+FFFF are represented in strings as two characters. Essentially, string uses the UTF-16 character encoding form.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

Is string a regex C#?

C# regex also known as C# regular expression or C# regexp is a sequence of characters that defines a pattern. A pattern may consist of literals, numbers, characters, operators, or constructs. The pattern is used to search strings or files to see if matches are found.


1 Answers

You can use Regex.Escape. For example:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

public class Test
{
    static void Main()
    {
        string[] predefined = { "some", "predefined", "words" };
        string[] products = { ".NET", "C#", "C# (2)" };

        IEnumerable<string> escapedKeywords = 
            predefined.Concat(products)
                      .Select(Regex.Escape);
        Regex regex = new Regex("(" + string.Join("|", escapedKeywords) + ")");
        Console.WriteLine(regex);
    }
}

Output:

(some|predefined|words|\.NET|C\#|C\#\ \(2\))

Or without the LINQ, but using string concatenation in a loop (which I try to avoid) as per your original code:

string regex = "(some|predefined|words";
foreach (Product product)
    regex += "|" + Regex.Escape(product.Name);
regex += ")";
like image 112
Jon Skeet Avatar answered Oct 14 '22 07:10

Jon Skeet