Many languages bound a string with some sort of quote, like this:
"Rob Malda is smart."
ANTLR 4 can match such a string with a lexer rule like this:
QuotedString : '"' .*? '"';
To use certain characters within the string, they must be escaped, perhaps like this:
"Rob \"Commander Taco\" Malda is smart."
ANTLR 4 can match this string as well;
EscapedString : '"' ('\\"|.)*? '"';
(taken from p96 of The Definitive ANTLR 4 Reference)
Here's my problem: Suppose that the character for escaping is the same character as the string delimiter. For example:
"Rob ""Commander Taco"" Malda is smart."
(This is perfectly legal in Powershell.)
What lexer rule would match this? I would think this would work:
EscapedString : '"' ('""'|.)*? '"';
But it doesn't. The lexer tokenizes the escape character "
as the end of string delimiter.
An escape sequence is a set of characters used in string literals that have a special meaning, such as a new line, a new page, or a tab. For example, the escape sequence \n represents a new line character. To ignore an escape sequence in your search, prepend a backslash character to the escape sequence.
Negate certain characters with the ~
operator:
EscapedString : '"' ( '""' | ~["] )* '"';
or, if there can't be line breaks in your string, do:
EscapedString : '"' ( '""' | ~["\r\n] )* '"';
You don't want to use the non-greedy operator, otherwise ""
would never be consumed and "a""b"
would be tokenized as "a"
and "b"
instead of a single token.
(Don't vote for this answer; vote for @Bart Kiers' answer.)
I'm offering this for completeness, as it's a small piece of a Powershell grammar. Combining the escape logic from p76 in The Definitive ANTLR 4 Reference with Bart's answer, here are the rules necessary for lexing escaped strings in Powershell:
EscapedString
: '"' (Escape | '""' | ~["])* '"'
| '\'' (Escape | '\'\'' | ~['])* '\''
| '\u201C' (Escape | .)*? ('\u201D' | '\u2033') // smart quotes
;
fragment Escape
: '\u0060\'' // backtick single-quote
| '\u0060"' // backtick double-quote
;
These rules handle the following four ways to escape strings in Powershell:
'Rob ''Commander Taco'' Malda is smart.'
"Rob ""Commander Taco"" Malda is smart."
'Rob `'Commander Taco`' Malda is smart.'
"Rob `"Commander Taco`" Malda is smart."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With