I want to build a simple regex that covers quoted strings, including any escaped quotes within them. For instance,
"This is valid"
"This is \" also \" valid"
Obviously, something like
"([^"]*)"
does not work, because it matches up to the first escaped quote.
What is the correct version?
I suppose the answer would be the same for other escaped characters (by just replacing the respective character).
By the way, I am aware of the "catch-all" regex
"(.*?)"
but I try to avoid it whenever possible, because, not surprisingly, it runs somewhat slower than a more specific one.
Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.
? The backslash character ( \ ) is the escaping character. It can be used to denote an escaped character, a string, literal, or one of the set of supported special characters. Use a double backslash ( \\ ) to denote an escaped string literal.
With some variations depending on the engine, regex usually defines a word character as a letter, digit or underscore. A word boundary \bdetects a position where one side is such a character, and the other is not.
If you're looking for a space, that would be " " (one space). If you're looking for one or more, it's " *" (that's two spaces and an asterisk) or " +" (one space and a plus).
Here is one that I've used in the past:
("[^"\\]*(?:\\.[^"\\]*)*")
This will capture quoted strings, along with any escaped quote characters, and exclude anything that doesn't appear in enclosing quotes.
For example, the pattern will capture "This is valid"
and "This is \" also \" valid"
from this string:
"This is valid" this won't be captured "This is \" also \" valid"
This pattern will not match the string "I don't \"have\" a closing quote
, and will allow for additional escape codes in the string (e.g., it will match "hello world!\n"
).
Of course, you'll have to escape the pattern to use it in your code, like so:
"(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With