Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where to free memory in Bison/Flex?

I'm using Bison & Flex for 1 month more or less, so I'm sorry if I don't see something obvious (but I don't think it is).

I have a problem about freeing memory with Flex Bison. Here is what my code looks like:

 parser.l

 {DATE}      { yylval.str= strdup(yytext);
             pair<string,string> newpair = make_pair("DATE",yytext);
             myvector.push_back(newpair);
              return TOKEN_DATE ;}

This is one of the example of my .l file. I copy the value of yytext into yylval.str. Then I create a new pair with that content (key/value, actually), then I return the token date for bison. My parser .y is not more than yyparse; and when something is caught, it just prints it.

I tried to run valgrind on this and I have multiple errors concerning strdup. I know this function uses malloc, but I have no idea WHERE and WHEN to use FREE.

I probably guess it's in the .y file, but where ?

 test:
      TOKEN_DATE                 { cout << $1 << endl; // here ? and what to free ?}

I don't really get all of this, I would really appreciate a simple and clear explanation.

Thanks in advance,


EDIT:

I tried several things like:

 test:
      TOKEN_DATE TOKEN_TOTO TOKEN_BLABLA { cout << $1 << endl; free($1); free($2);}
    | TOKEN_DATE test { cout << $1 << endl, free($1); }

It seems to compile and execute good, but valgrind still says to me that there is a problem with the malloc contained in the strdup function. But I cannot write free(yylval.str) inside the flex file, otherwise, bison will not be aware of the value (if I understood correctly, I tried it doesn't work). I really have no idea on how to remove this leaking problem.

like image 568
blackmesa Avatar asked Mar 20 '14 13:03

blackmesa


People also ask

What are flex and bison?

As UNIX® developers know, Flex and Bison are powerful tools for developing lexical and grammar parsers, in particular language compilers and interpreters.

What makes bison so special?

The first Bison feature of interest, hidden deep in the Bison manuals, is that it is possible to generate more meaningful error messages in case of a syntax error by using the macro YYERROR_VERBOSE. This message is much better for debugging.

How does bison handle syntax errors?

The first Bison feature of interest, hidden deep in the Bison manuals, is that it is possible to generate more meaningful error messages in case of a syntax error by using the macro YYERROR_VERBOSE. This message is much better for debugging. With the old error messages, it is not easy to identify semantic errors.

What is the grammar in the bison Handbook?

The grammar is very much the same as in the Bison handbook except for the use of names as terminal symbols and the introduction of identifiers. An identifier is defined and initialized in an assignment and can be used anywhere a value is allowed. Listing 3 shows a sample grammar: Listing 3. Sample Bison grammar


3 Answers

You need to free the copied string once you no longer need it. In your rather simple case you could free($1) after printing it out, but it is often the case that the parser inserts the copied string into some datastructure, in which case that datastructure becomes the owner of the malloc'd storage, and the call to free will be performed in a destructor.

It's not really different from any other resource management issue; you need to always be clear who is the owner of an allocated resource because the owner has the responsibility for releasing the resource when it is no longer needed.

What's going on internally is that bison maintains a stack of semantic values, each of which has the type YYSTYPE (i.e. "semantic type"), which is also the type of yylval. When a token is shifted onto the stack, bison copies yylval onto the top of the stack. Before executing an action corresponding to a production, bison arranges for the semantic values of each terminal and non-terminal in the production to be known as $1, $2, etc. (This is not a copy; the various $x symbols are replaced with a reference to a location on the bison stack.)

Non-terminals also have semantic values, because each action stores a value into the pseudo-variable $$. (If the action doesn't do this, the value of $$ is unpredictable, but it still exists.) After the action finishes, bison removes the $1, $2... values from the top of the stack, and then copies the pseudo-variable $$ onto the top of the stack. It does not do anything to the values which were popped, so if they need to be freed or otherwise destructed, the action must do this itself.

Because semantic values are copied naively, the semantic type should not contain any C++ object which is not trivially copyable.

If you use the %union declaration, then the semantic type YYSTYPE is a union object, and you need to tell bison which union tag applies to each terminal and non-terminal. In that case, $$ and all the $n have the correct .tag appended to them automatically, and the actions become somewhat more type-safe.

like image 136
rici Avatar answered Oct 11 '22 08:10

rici


From the flex manual:

21.3 A Note About yytext And Memory

When flex finds a match, yytext points to the first character of the match in the input buffer. The string itself is part of the input buffer, and is NOT allocated separately. The value of yytext will be overwritten the next time yylex() is called. In short, the value of yytext is only valid from within the matched rule's action.

Therefore shouldn't make_pair be the following?

pair<string,string> newpair = make_pair("DATE",yylval.str);

if so, then the string should be freed when cleaning up the pairs memory.

like image 37
Paul Humphreys Avatar answered Oct 11 '22 08:10

Paul Humphreys


The easy answer: it depends. You should free memory when you are done using it, whenever that may be.

The more helpful answer: try using as few allocations of memory as possible. If you never malloc any memory, you never have to free any memory. In the example you give, you pattern match on a date. Typically dates hold to a format and have some upper bound in length; for example, format 'yyyy/mm/dd' has 10 characters in it. If you can expect this, instead of creating a new string to hold the date, you can simply put a statically sized character array into yylval, such as char date[10];.

like image 32
Josh Avatar answered Oct 11 '22 06:10

Josh