Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to match escaped characters (quotes)

I want to build a simple regex that covers quoted strings, including any escaped quotes within them. For instance,

"This is valid"
"This is \" also \" valid"

Obviously, something like

"([^"]*)"

does not work, because it matches up to the first escaped quote.

What is the correct version?

I suppose the answer would be the same for other escaped characters (by just replacing the respective character).

By the way, I am aware of the "catch-all" regex

"(.*?)"

but I try to avoid it whenever possible, because, not surprisingly, it runs somewhat slower than a more specific one.

like image 756
PNS Avatar asked Jun 29 '11 18:06

PNS


People also ask

Do quotes need to be escaped in regex?

Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.

What is escaped character in regex?

? The backslash character ( \ ) is the escaping character. It can be used to denote an escaped character, a string, literal, or one of the set of supported special characters. Use a double backslash ( \\ ) to denote an escaped string literal.

What does \b mean in regex?

With some variations depending on the engine, regex usually defines a word character as a letter, digit or underscore. A word boundary \bdetects a position where one side is such a character, and the other is not.

How do you match a space in regex?

If you're looking for a space, that would be " " (one space). If you're looking for one or more, it's " *" (that's two spaces and an asterisk) or " +" (one space and a plus).


1 Answers

Here is one that I've used in the past:

("[^"\\]*(?:\\.[^"\\]*)*")

This will capture quoted strings, along with any escaped quote characters, and exclude anything that doesn't appear in enclosing quotes.

For example, the pattern will capture "This is valid" and "This is \" also \" valid" from this string:

"This is valid" this won't be captured "This is \" also \" valid"

This pattern will not match the string "I don't \"have\" a closing quote, and will allow for additional escape codes in the string (e.g., it will match "hello world!\n").

Of course, you'll have to escape the pattern to use it in your code, like so:

"(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")"
like image 76
arcain Avatar answered Sep 21 '22 10:09

arcain