Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for matching C# string literals

Tags:

c#

regex

I am trying to write a regular expression that will match a string that contains name-value pairs of the form:

<name> = <value>, <name> = <value>, ...

Where <value> is a C# string literal. I already know the s that I need to find via this regular expression. So far I have the following:

regex = new Regex(fieldName + @"\s*=\s*""(.*?)""");

This works well, but it of course fails to match in the case where the string I am trying to match contans a <value> with an escaped quote. I am struggling to work out how to solve this, I think I need a lookahead, but need a few pointers. As an example, I would like to be able to match the value of the 'difficult' named value below:

difficult = "\\\a\b\'\"\0\f \t\v", easy = "one"

I would appreciate a decent explanation with your answers, I want to learn, rather than copy ;-)

like image 521
ColinE Avatar asked Feb 10 '11 05:02

ColinE


2 Answers

Try this to capture the key and value:

(\w+)\s*=\s*(@"(?:[^"]|"")*"|"(?:\\.|[^\\"])*")

As a bonus, it also works on verbatim strings.

C# Examples:https://dotnetfiddle.net/vQP4rn

Here's an annotated version:

string pattern = @"
(\w+)\s*=\s*    # key =
(               # Capturing group for the string
    @""               # verbatim string - match literal at-sign and a quote
    (?:
        [^""]|""""    # match a non-quote character, or two quotes
    )*                # zero times or more
    ""                #literal quote
|               #OR - regular string
    ""              # string literal - opening quote
    (?:
        \\.         # match an escaped character,
        |[^\\""]    # or a character that isn't a quote or a backslash
    )*              # a few times
    ""              # string literal - closing quote
)";
MatchCollection matches = Regex.Matches(s, pattern, 
                                        RegexOptions.IgnorePatternWhitespace);

Note that the regular string allows all characters to be escaped, unlike in C#, and allows newlines. It should be easy to correct if you need validation, but it should be file for parsing.

like image 108
Kobi Avatar answered Oct 03 '22 05:10

Kobi


This should match only the string literal part (you can tack on whatever else you want to the beginning/end):

Regex regex = new Regex("\"((\\.)|[^\\\\\"])*\"");

and if you want a pattern which doesn't allow "multi-line" string literals (as C# string literals really are):

Regex regex = new Regex("\"((\\[^\n\r])|[^\\\\\"\n\r])*\"");
like image 40
helloworld922 Avatar answered Oct 03 '22 03:10

helloworld922