I'm trying to find all of the quoted text on a single line.
Example:
"Some Text" "Some more Text" "Even more text about \"this text\""
I need to get:
"Some Text"
"Some more Text"
"Even more text about \"this text\""
\"[^\"\r]*\"
gives me everything except for the last one, because of the escaped quotes.
I have read about \"[^\"\\]*(?:\\.[^\"\\]*)*\"
working, but I get an error at run time:
parsing ""[^"\]*(?:\.[^"\]*)*"" - Unterminated [] set.
How do I fix this?
To represent a double quotation mark in a string literal, use the escape sequence \". The single quotation mark (') can be represented without an escape sequence. The backslash (\) must be followed with a second backslash (\\) when it appears within a string.
To escape a single or double quote in a string, use a backslash \ character before each single or double quote in the contents of the string, e.g. 'that\'s it' .
Double Quotes inside verbatim strings can be escaped by using 2 sequential double quotes "" to represent one double quote " in the resulting string. var str = @"""I don't think so,"" he said.
Whenever you use a grep regular expression at the command prompt, surround it with quotes, or escape metacharacters (such as & ! . * $ ? and \ ) with a backslash ( \ ). finds any line in the file list starting with "b." displays any line in list where "b" is the only character on the line.
What you've got there is an example of Friedl's "unrolled loop" technique, but you seem to have some confusion about how to express it as a string literal. Here's how it should look to the regex compiler:
"[^"\\]*(?:\\.[^"\\]*)*"
The initial "[^"\\]*
matches a quotation mark followed by zero or more of any characters other than quotation marks or backslashes. That part alone, along with the final "
, will match a simple quoted string with no embedded escape sequences, like "this"
or ""
.
If it does encounter a backslash, \\.
consumes the backslash and whatever follows it, and [^"\\]*
(again) consumes everything up to the next backslash or quotation mark. That part gets repeated as many times as necessary until an unescaped quotation mark turns up (or it reaches the end of the string and the match attempt fails).
Note that this will match "foo\"-
in \"foo\"-"bar"
. That may seem to expose a flaw in the regex, but it doesn't; it's the input that's invalid. The goal was to match quoted strings, optionally containing backslash-escaped quotes, embedded in other text--why would there be escaped quotes outside of quoted strings? If you really need to support that, you have a much more complex problem, requiring a very different approach.
As I said, the above is how the regex should look to the regex compiler. But you're writing it in the form of a string literal, and those tend to treat certain characters specially--i.e., backslashes and quotation marks. Fortunately, C#'s verbatim strings save you the hassle of having to double-escape backslashes; you just have to escape each quotation mark with another quotation mark:
Regex r = new Regex(@"""[^""\\]*(?:\\.[^""\\]*)*""");
So the rule is double quotation marks for the C# compiler and double backslashes for the regex compiler--nice and easy. This particular regex may look a little awkward, with the three quotation marks at either end, but consider the alternative:
Regex r = new Regex("\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"");
In Java, you always have to write them that way. :-(
Regex for capturing strings (with \
for character escaping), for the .NET engine:
(?>(?(STR)(?(ESC).(?<-ESC>)|\\(?<ESC>))|(?!))|(?(STR)"(?<-STR>)|"(?<STR>))|(?(STR).|(?!)))+
Here, a "friendly" version:
(?> | especify nonbacktracking (?(STR) | if (STRING MODE) then (?(ESC) | if (ESCAPE MODE) then .(?<-ESC>) | match any char and exits escape mode (pop ESC) | | else \\(?<ESC>) | match '\' and enters escape mode (push ESC) ) | endif | | else (?!) | do nothing (NOP) ) | endif | | -- OR (?(STR) | if (STRING MODE) then "(?<-STR>) | match '"' and exits string mode (pop STR) | | else "(?<STR>) | match '"' and enters string mode (push STR) ) | endif | | -- OR (?(STR) | if (STRING MODE) then . | matches any character | | else (?!) | do nothing (NOP) ) | endif )+ | REPEATS FOR EVERY CHARACTER
Based on http://tomkaminski.com/conditional-constructs-net-regular-expressions examples. It relies in quotes balancing. I use it with great success. Use it with Singleline
flag.
To play around with regexes, I recommend Rad Software Regular Expression Designer, which has a nice "Language Elements" tab with quick access to some basic instructions. It's based at .NET's regex engine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With