Flex / Lex Encoding Strings with Escaped Characters

Tags:

I'll refer to this question for some of the background:

Regular expression for a string literal in flex/lex

The problem I am having is handling the input with escaped characters in my lexer and I think it may be an issue to do with the encoding of the string, but I'm not sure.

Here's is how I am handling string literals in my lexer:

\"(\\.|[^\\"])*\"
{                   
    char* text1 = strndup(yytext + 1, strlen(yytext) - 2);
    char* text2 = "text\n";

    printf("value = <%s> <%x>\n", text1, text1);
    printf("value = <%s> <%x>\n", text2, text2);
}

This outputs the following:

value = <text\n"> <15a1bb0>
value = <text
> <7ac871>

It appears to be treating the newline character separately as a backslash followed by an n.

What's going on here, how do I process the text to be identical to the C input?

928

asked Mar 24 '11 11:03

Dan

1 Answers

Your regexp just matches string \ escapes -- it doesn't actually translate them into the characters that they represent. I prefer to handle this sort of thing with a flex start state and string building buffer that can accumulate characters. Something like:

%{
static StringBuffer strbuf;
%}
%x string
%%

\"                  { BEGIN string; ClearBuffer(strbuf); }
<string>[^\\"\n]*   { AppendBufferString(strbuf, yytext); }
<string>\\n         { AppendBufferChar(strbuf, '\n'); }
<string>\\t         { AppendBufferChar(strbuf, '\t'); }
<string>\\[0-7]*    { AppendBufferChar(strbuf, strtol(yytext+1, 0, 8)); }
<string>\\[\\"]     { AppendBufferChar(strbuf, yytext[1]); }
<string>\"          { yylval.str = strdup(BufferData(strbuf)); BEGIN 0; return STRING; }
<string>\\.         { error("bogus escape '%s' in string\n", yytext); }
<string>\n          { error("newline in string\n"); }

This makes what is going on much clearer, makes it easy to add new escape processing code for new escapes, and makes it easy to issue clear error messages when something goes wrong.

155

answered Oct 31 '22 10:10

Chris Dodd

Related questions
                            
                                What is an optimal format for saving large amounts of numerical data (GBs) from a C program?
                            
                                c++ program to watch directory for alterations
                            
                                C2061 Syntax Error (identifier)
                            
                                UTF-8 -> ASCII in C language
                            
                                Project with both c and c++ files
                            
                                Operating system agnostic C library
                            
                                C: Which is faster, accessing global variable or passing a pointer to the function
                            
                                Trace changes to variables automatically
                            
                                bus error when trying to access character on a string in C
                            
                                How to read data into a time_t variable using scanf()?
                            
                                does varargs offer a kind of poor man's polymorphism?
                            
                                Disable CONTROL + ALT + DELETE and Windows(win) Key in Windows 7 using Win32 application [duplicate]
                            
                                Is this a valid C statement?
                            
                                Why does an EXE file that does *nothing* contain so many dummy zero bytes?
                            
                                C function declaration within another function
                            
                                Handling asynchronous sockets in WinSock?
                            
                                Are repeated recv() calls expensive?
                            
                                When do we use goto *expr; in C?
                            
                                Pointer arithmetic in c and array bounds
                            
                                What's the syntax of the following in c?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Flex / Lex Encoding Strings with Escaped Characters

Tags:

c

string

bison

flex-lexer

lex

Dan

People also ask

1 Answers

Chris Dodd

Recent Activity

Donate For Us