I've been looking through the ANTLR v3 documentation (and my trusty copy of "The Definitive ANTLR reference"), and I can't seem to find a clean way to implement escape sequences in string literals (I'm currently using the Java target). I had hoped to be able to do something like:
fragment ESCAPE_SEQUENCE : '\\' '\'' { setText("'"); } ; STRING : '\'' (ESCAPE_SEQUENCE | ~('\'' | '\\'))* '\'' { // strip the quotes from the resulting token setText(getText().substring(1, getText().length() - 1)); } ;
For example, I would want the input token "'Foo\'s House'
" to become the String "Foo's House
".
Unfortunately, the setText(...)
call in the ESCAPE_SEQUENCE
fragment sets the text for the entire STRING
token, which is obviously not what I want.
Is there a way to implement this grammar without adding a method to go back through the resulting string and manually replace escape sequences (e.g., with something like setText(escapeString(getText()))
in the STRING
rule)?
String literal syntaxUse the escape sequence \n to represent a new-line character as part of the string. Use the escape sequence \\ to represent a backslash character as part of the string. You can represent a single quotation mark symbol either by itself or with the escape sequence \' .
String literals may contain any valid characters, including escape sequences such as \n, \t, etc. Octal and hexadecimal escape sequences are technically legal in string literals, but not as commonly used as they are in character constants, and have some potential problems of running on into following text.
For example, \n is an escape sequence that denotes a newline character.
Here is how I accomplished this in the JSON parser I wrote.
STRING @init{StringBuilder lBuf = new StringBuilder();} : '"' ( escaped=ESC {lBuf.append(getText());} | normal=~('"'|'\\'|'\n'|'\r') {lBuf.appendCodePoint(normal);} )* '"' {setText(lBuf.toString());} ; fragment ESC : '\\' ( 'n' {setText("\n");} | 'r' {setText("\r");} | 't' {setText("\t");} | 'b' {setText("\b");} | 'f' {setText("\f");} | '"' {setText("\"");} | '\'' {setText("\'");} | '/' {setText("/");} | '\\' {setText("\\");} | ('u')+ i=HEX_DIGIT j=HEX_DIGIT k=HEX_DIGIT l=HEX_DIGIT {setText(ParserUtil.hexToChar(i.getText(),j.getText(), k.getText(),l.getText()));} ) ;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With