Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flex default rule

How do I customize the default action for flex. I found something like <*> but when I run it it says "flex scanner jammed"? Also the . rule only adds a rule so it does not work either. What I want is

comment               "/*"[^"*/"]*"*/"

%%
{comment}             return 1;
{default}             return 0; 
<<EOF>>               return -1;

Is it possible to change the behavior of matching longest to match first? If so I would do something like this

default               (.|\n)*

but because this almost always gives a longer match it will hide the comment rule.

EDIT

I found the {-} operator in the manual, however this example straight from the manual gives me "unrecogized rule":

[a-c]{-}[b-z]

like image 970
user877329 Avatar asked Dec 28 '22 02:12

user877329


2 Answers

The flex default rule matches a single character and prints it on standard output. If you don't want that action, write an explicit rule which matches a single character and does something else.

The pattern (.|\n)* matches the entire input file as a single token, so that is a very bad idea. You're thinking that the default should be a long match, but in fact you want that to be as short as possible (but not empty).

The purpose of the default rule is to do something when there is no match for any of the tokens in the input language. When lex is used for tokenizing a language, such a situation is almost always erroneous because it means that the input begins with a character which is not the start of any valid token of the language.

Thus, a "catch any character" rule is coded as a form of error recovery. The idea is to discard the bad character (just one) and try tokenizing from the character after that one. This is only a guess, but it's a good guess because it's based on what is known: namely that there is one bad character in the input.

The recovery rule can be wrong. For instance suppose that no token of the language begins with @, and the programmer wanted to write the string literal "@abc". Only, she forgot the opening " and wrote @abc". The right fix is to insert the missing ", not to discard the @. But that would require a much more clever set of rules in the lexer.

Anyway, usually when discarding a bad character, you want to issue an error message for this case like "skipping invalid character '~` in line 42, column 3".

The default rule/action of copying the unmatched character to standard output is useful when lex is used for text filtering. The default rule then brings about the semantics of a regex search (as opposed to a regex match): the idea is to search the input for matches of the lexer's token-recognizing state machine, while printing all material that is skipped by that search.

So for instance, a lex specification containing just the rule:

 "foo" { printf("bar"); }

will implement the equivalent of

 sed -e 's/foo/bar/g'
like image 64
Kaz Avatar answered Jan 18 '23 11:01

Kaz


I solved the problem manually instead if trying to match the complement of a rule. This works fine because the matching pattern involved in this case is quite simple.

like image 20
user877329 Avatar answered Jan 18 '23 11:01

user877329