I am trying to write a regular expression that will match a string that contains name-value pairs of the form:
<name> = <value>, <name> = <value>, ...
Where <value> is a C# string literal. I already know the s that I need to find via this regular expression. So far I have the following:
regex = new Regex(fieldName + @"\s*=\s*""(.*?)""");
This works well, but it of course fails to match in the case where the string I am trying to match contans a <value> with an escaped quote. I am struggling to work out how to solve this, I think I need a lookahead, but need a few pointers. As an example, I would like to be able to match the value of the 'difficult' named value below:
difficult = "\\\a\b\'\"\0\f \t\v", easy = "one"
I would appreciate a decent explanation with your answers, I want to learn, rather than copy ;-)
Try this to capture the key and value:
(\w+)\s*=\s*(@"(?:[^"]|"")*"|"(?:\\.|[^\\"])*")
As a bonus, it also works on verbatim strings.
C# Examples:https://dotnetfiddle.net/vQP4rn
Here's an annotated version:
string pattern = @"
(\w+)\s*=\s* # key =
( # Capturing group for the string
@"" # verbatim string - match literal at-sign and a quote
(?:
[^""]|"""" # match a non-quote character, or two quotes
)* # zero times or more
"" #literal quote
| #OR - regular string
"" # string literal - opening quote
(?:
\\. # match an escaped character,
|[^\\""] # or a character that isn't a quote or a backslash
)* # a few times
"" # string literal - closing quote
)";
MatchCollection matches = Regex.Matches(s, pattern,
RegexOptions.IgnorePatternWhitespace);
Note that the regular string allows all characters to be escaped, unlike in C#, and allows newlines. It should be easy to correct if you need validation, but it should be file for parsing.
This should match only the string literal part (you can tack on whatever else you want to the beginning/end):
Regex regex = new Regex("\"((\\.)|[^\\\\\"])*\"");
and if you want a pattern which doesn't allow "multi-line" string literals (as C# string literals really are):
Regex regex = new Regex("\"((\\[^\n\r])|[^\\\\\"\n\r])*\"");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With