I need a Perl regular expression to match a string. I'm assuming only double-quoted strings, that a \" is a literal quote character and NOT the end of the string, and that a \ is a literal backslash character and should not escape a quote character. If it's not clear, some examples:
"\"" # string is 1 character long, contains dobule quote "\\" # string is 1 character long, contains backslash "\\\"" # string is 2 characters long, contains backslash and double quote "\\\\" # string is 2 characters long, contains two backslashes
I need a regular expression that can recognize all 4 of these possibilities, and all other simple variations on those possibilities, as valid strings. What I have now is:
/".*[^\\]"/
But that's not right - it won't match any of those except the first one. Can anyone give me a push in the right direction on how to handle this?
Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.
If you need to use the double quote inside the string, you can use the backslash character. Notice how the backslash in the second line is used to escape the double quote characters. And the single quote can be used without a backslash.
To represent a double quotation mark in a string literal, use the escape sequence \". The single quotation mark (') can be represented without an escape sequence. The backslash (\) must be followed with a second backslash (\\) when it appears within a string.
In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped. Some flavors only use ^ and $ as metacharacters when they are at the start or end of the regex respectively. In those flavors, no additional escaping is necessary. It's usually just best to escape them anyway.
/"(?:[^\\"]|\\.)*"/
This is almost the same as Cal's answer, but has the advantage of matching strings containing escape codes such as \n
.
The ?:
characters are there to prevent the contained expression being saved as a backreference, but they can be removed.
NOTE: as pointed out by Louis Semprini, this is limited to 32kb texts due a recursion limit built into Perl's regex engine (that unfortunately silently returns a failure when hit, instead of crashing loudly).
How about this?
/"([^\\"]|\\\\|\\")*"/
matches zero or more characters that aren't slashes or quotes OR two slashes OR a slash then a quote
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With