Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement C++0x raw string literal?

How to define a working set of lexer and parser (exempli gratia: flex and bison) to support the C++0x styled raw string literals?

As you may already know, new string literals in C++0x can be expressed in a very flexible way.

R"<delim>...<delim>"; - in this code the <delim> can be pretty much everything and also no escape characters are needed.

Any kind of parentheses can be used to delimit the end of string:

R"(I love those who yearn for the impossible. (Von Goethe, "Faust"))";

Blocks of text can be simply defined using equal occurrences of same characters:

R";***************************(
  ; TINY BASIC FOR INTEL 8080  
  ;       VERSION 2.0  
  ;     BY LI-CHEN WANG  
  ; MODIFIED AND TRANSLATED  
  ;    TO INTEL MNEMONICS  
  ;     BY ROGER RAUSKOLB  
  ;     10 OCTOBER, 1976  
  ;       @COPYLEFT  
  ;  ALL WRONGS RESERVED      )
  ;***************************";

More information can be found here(wikipedia) and here(att).

I would like to use this fantastic feature in a language I am developing now.

So, how can I define a proper tokenizer and syntax analyzer to achive the result?

Thanks in advance for your answers!

like image 315
Rizo Avatar asked Jun 24 '10 20:06

Rizo


1 Answers

You could proprocess literals in lexical analysis stage and transform them into something like meta token.

Input:  
    int a;  
    char *b = R"....";  

Preprocessed:  
    int a;
    char *b = R*literal[0]*;

Tokenized:  
    INT symbol[0] DELIM  
    CHAR OP_ASTR symbol[1] OP_EQ symbol[2] *literal[0]* DELIM  

Symbol table contents { "a", "b", "R" }  

Literal table contents { "...." }  

literal[0] is the pointer to the original literal text.

like image 175
9dan Avatar answered Nov 15 '22 06:11

9dan