Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I match double-quoted strings with escaped double-quote characters?

I need a Perl regular expression to match a string. I'm assuming only double-quoted strings, that a \" is a literal quote character and NOT the end of the string, and that a \ is a literal backslash character and should not escape a quote character. If it's not clear, some examples:

"\""    # string is 1 character long, contains dobule quote "\\"    # string is 1 character long, contains backslash "\\\""  # string is 2 characters long, contains backslash and double quote "\\\\"  # string is 2 characters long, contains two backslashes 

I need a regular expression that can recognize all 4 of these possibilities, and all other simple variations on those possibilities, as valid strings. What I have now is:

/".*[^\\]"/ 

But that's not right - it won't match any of those except the first one. Can anyone give me a push in the right direction on how to handle this?

like image 767
Chris Lutz Avatar asked Jan 26 '09 20:01

Chris Lutz


People also ask

How do you match double quotes in regex?

Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.

How can you include a double quote inside a double quoted string?

If you need to use the double quote inside the string, you can use the backslash character. Notice how the backslash in the second line is used to escape the double quote characters. And the single quote can be used without a backslash.

What is the escape sequence for displaying a double quote in a string?

To represent a double quotation mark in a string literal, use the escape sequence \". The single quotation mark (') can be represented without an escape sequence. The backslash (\) must be followed with a second backslash (\\) when it appears within a string.

Do you have to escape quotes in regex?

In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped. Some flavors only use ^ and $ as metacharacters when they are at the start or end of the regex respectively. In those flavors, no additional escaping is necessary. It's usually just best to escape them anyway.


2 Answers

/"(?:[^\\"]|\\.)*"/

This is almost the same as Cal's answer, but has the advantage of matching strings containing escape codes such as \n.

The ?: characters are there to prevent the contained expression being saved as a backreference, but they can be removed.

NOTE: as pointed out by Louis Semprini, this is limited to 32kb texts due a recursion limit built into Perl's regex engine (that unfortunately silently returns a failure when hit, instead of crashing loudly).

like image 137
j_random_hacker Avatar answered Oct 25 '22 00:10

j_random_hacker


How about this?

/"([^\\"]|\\\\|\\")*"/ 

matches zero or more characters that aren't slashes or quotes OR two slashes OR a slash then a quote

like image 45
Cal Avatar answered Oct 25 '22 01:10

Cal