I'm trying to use flex and bison to create a filter, because I want get certain grammar elements from a complex language. My plan is to use flex + bison to recognise the grammar, and dump out the location of elements of interest. (Then use a script to grab text according the locations dumped.) I found flex can support a bison feature called bison-locations, but how it works in exactly. I tried the example in flex document, it seems the yylloc is not set automatically by flex, I always get <code>(1,0)-(1,0)</code>. Could flex calculate each token's location automatically? If not, what interface function is defined for me to implement? Is there any example? Any better solution regarding to tools? Best Regards, Kevin Edit: Now the interface for yylex turn to: <pre class="prettyprint"><code>int yylex(YYSTYPE * yylval_param,YYLTYPE * yylloc_param ); </code></pre> bison manual does not specify how lexer should implement to correctly set yylloc_param. For me it is hard to manually trace column number of each token.

The yylex declaration probably changed because you used a reentrant or pure-parser. Seems like many documents around the web suggest it's required if you want bison locations to work but it's not required. I needed line numbers too and found the Bison documentation confusing in that regard. The simple solution (using the global var yylloc): In your Bison file just add the %locations directive: <pre class="prettyprint"><code>%{ ... %} %locations ... %% ... </code></pre> in your lexer: <pre class="prettyprint"><code>%{ ... #include "yourprser.tab.h" /* This is where it gets the definition for yylloc from */ #define YY_USER_ACTION yylloc.first_line = yylloc.last_line = yylineno; %} %option yylineno ... %% ... </code></pre> The YY_USER_ACTION macro is "called" before each of your token actions and updates yylloc. Now you can use the @N/@$ rules like this: <pre class="prettyprint"><code>statement : error ';' { fprintf(stderr, "Line %d: Bad statement.\n", @1.first_line); } </code></pre> , or use the yylloc global var: <pre class="prettyprint"><code>void yyerror(char *s) { fprintf(stderr, "ERROR line %d: %s\n", yylloc.first_line, s); } </code></pre>

Neither <code>bison</code> nor <code>flex</code> updates <code>yylloc</code> automatically, but it's actually not difficult to do it yourself—if you know the trick. The trick to implementing <code>yylloc</code> support is that, even though <code>yyparse()</code> declares <code>yylloc</code>, it never changes it. That means that if you modify <code>yylloc</code> in one call to the lexer, you'll find the same values in it at the next call. Thus, <code>yylloc</code> will contain the position of the last token. Since the last token's end is the same as the current token's start, you can use the old <code>yylloc</code> value to help you determine the new value. In other words, <code>yylex()</code> should not calculate <code>yylloc</code>; it should update <code>yylloc</code>. To update <code>yylloc</code>, we must first copy the <code>last_</code> values to <code>first_</code>, and then update the <code>last_</code> values to reflect the length of the just-matched token. (This is not the <code>strlen()</code> of the token; it's the lines-and-columns length.) We can do this in the <code>YY_USER_ACTION</code> macro, which is called just before any lexer action is performed; that ensures that if a rule matches but it doesn't return a value (for instance, a rule skipping whitespace or comments), the location of that non-token is skipped, rather than being included at the beginning of the actual token, or lost in a way that makes the location tracking inaccurate. Here's a version meant for a reentrant parser; you could modify it for a non-reentrant parser by swapping the <code>-></code> operators for <code>.</code>: <pre class="prettyprint"><code>#define YY_USER_ACTION \ yylloc->first_line = yylloc->last_line; \ yylloc->first_column = yylloc->last_column; \ for(int i = 0; yytext[i] != '\0'; i++) { \ if(yytext[i] == '\n') { \ yylloc->last_line++; \ yylloc->last_column = 0; \ } \ else { \ yylloc->last_column++; \ } \ } </code></pre> If you'd prefer, you could instead put that code in a function and make the macro call the function, but the two techniques are equivalent.

How does flex support bison-location exactly?

Tags:

I'm trying to use flex and bison to create a filter, because I want get certain grammar elements from a complex language. My plan is to use flex + bison to recognise the grammar, and dump out the location of elements of interest. (Then use a script to grab text according the locations dumped.)

I found flex can support a bison feature called bison-locations, but how it works in exactly. I tried the example in flex document, it seems the yylloc is not set automatically by flex, I always get (1,0)-(1,0). Could flex calculate each token's location automatically? If not, what interface function is defined for me to implement? Is there any example?

Any better solution regarding to tools?

Best Regards, Kevin

Edit:

Now the interface for yylex turn to:

int yylex(YYSTYPE * yylval_param,YYLTYPE * yylloc_param );

bison manual does not specify how lexer should implement to correctly set yylloc_param. For me it is hard to manually trace column number of each token.

592

asked Mar 18 '09 01:03

Kevin Yu

2 Answers

The yylex declaration probably changed because you used a reentrant or pure-parser. Seems like many documents around the web suggest it's required if you want bison locations to work but it's not required.

I needed line numbers too and found the Bison documentation confusing in that regard. The simple solution (using the global var yylloc): In your Bison file just add the %locations directive:

%{ ... %} %locations ... %% ...

in your lexer:

%{ ... #include "yourprser.tab.h"  /* This is where it gets the definition for yylloc from */ #define YY_USER_ACTION yylloc.first_line = yylloc.last_line = yylineno; %} %option yylineno ... %% ...

The YY_USER_ACTION macro is "called" before each of your token actions and updates yylloc. Now you can use the @N/@$ rules like this:

statement : error ';'   { fprintf(stderr, "Line %d: Bad statement.\n", @1.first_line); }

, or use the yylloc global var:

void yyerror(char *s) {   fprintf(stderr, "ERROR line %d: %s\n", yylloc.first_line, s); }

145

answered Oct 20 '22 10:10

Shlomi Loubaton

Neither bison nor flex updates yylloc automatically, but it's actually not difficult to do it yourself—if you know the trick.

The trick to implementing yylloc support is that, even though yyparse() declares yylloc, it never changes it. That means that if you modify yylloc in one call to the lexer, you'll find the same values in it at the next call. Thus, yylloc will contain the position of the last token. Since the last token's end is the same as the current token's start, you can use the old yylloc value to help you determine the new value.

In other words, yylex() should not calculate yylloc; it should update yylloc.

To update yylloc, we must first copy the last_ values to first_, and then update the last_ values to reflect the length of the just-matched token. (This is not the strlen() of the token; it's the lines-and-columns length.) We can do this in the YY_USER_ACTION macro, which is called just before any lexer action is performed; that ensures that if a rule matches but it doesn't return a value (for instance, a rule skipping whitespace or comments), the location of that non-token is skipped, rather than being included at the beginning of the actual token, or lost in a way that makes the location tracking inaccurate.

Here's a version meant for a reentrant parser; you could modify it for a non-reentrant parser by swapping the -> operators for .:

#define YY_USER_ACTION \     yylloc->first_line = yylloc->last_line; \     yylloc->first_column = yylloc->last_column; \     for(int i = 0; yytext[i] != '\0'; i++) { \         if(yytext[i] == '\n') { \             yylloc->last_line++; \             yylloc->last_column = 0; \         } \         else { \             yylloc->last_column++; \         } \     }

If you'd prefer, you could instead put that code in a function and make the macro call the function, but the two techniques are equivalent.

answered Oct 20 '22 12:10

Becca Royal-Gordon

Related questions
                            
                                How to get the name of the calling class in Java?
                            
                                Why does Hibernate query have compile error in IntelliJ?
                            
                                Convert a HTML table data into a JSON object in jQuery
                            
                                jQuery - Set min-height of div
                            
                                full HTML of object returned by jQuery selector
                            
                                javascript - get custom attribute based on an id
                            
                                codesign "The operation was cancelled by the user"
                            
                                Use reflection to invoke an overridden base method
                            
                                SQL Management Studio Plug-ins for Intellisense/Autoformat? [closed]
                            
                                Is the use of `const` dogmatic or rational?
                            
                                How to select date from a select box using Capybara in Rails 3?
                            
                                Return date as ddmmyyyy in SQL Server

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With