Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing an int(x) parameter

Here is a simple function with one int parameter:

void f(int x) {}

f(42);

And here is another function with one int parameter:

void g(int(x)) {}

g(42);

Now let us define x to be a type:

typedef int x;
void h(int(x)) {}

h(42);
// warning: passing argument 1 of ‘h’ makes pointer from integer without a cast

(This is the behavior I observe with gcc 4.8.2)

How do parser writers deal with this situation?

It seems the classic pipeline Lexer -> Parser -> Semantic Checker -> ... does not work here.

like image 432
fredoverflow Avatar asked Feb 28 '15 19:02

fredoverflow


2 Answers

You've effectively defined h as:

void h(int(int)) {}

The parameter is interpreted as an unnamed function pointer that takes an int and returns an int. When you try to pass 42 to it, the compiler complains that you are trying to make a function pointer from an integer.

I think what you are asking for is how do compilers handle (unnamed) function pointer types and their possibly ambiguous parses. Your question is related to the the most vexing parse in C++.

There they decided that whenever there was ambiguity between a function pointer type and another way to parse, then it would be interpreted as a function pointer. They did that because there are other ways to disambiguate when you don't want it to be a function pointer (e.g. - surround it in parentheses, use {} initializer syntax, etc.).

Getting into the specifics of how a parser writer might deal with this parse, here's a lexical analyzer and grammar for C11: http://quut.com/c/ANSI-C-grammar-l-2011.html In your example, before the typedef, x will be an IDENTIFIER token while after, it will be a TYPEDEF_NAME token because the analyzer is being informed through the symbol table that x is now a type. In this particular case, the parsing is unambiguous then. The "pipeline feedback" that you seem to be referring to occurs through the symbol table in this case, where the lexical analyzer is informed about context by the higher levels that affects its output as the compilation progresses.

EDIT: These three articles, found by the OP, describe this problem and how it is solved by some C parsers / compilers very nicely. Basically, a context free grammar (CFG) that only accepts / generates legal C syntax can almost be specified. With the introduction of a scoped lookup table that allows the lexical analyzer to distinguish between identifiers and typedef-names appropriately, then a CFG [and more importantly a LALR(1) parser (e.g. - yacc generated)] that only accepts / generates legal C syntax can be specified.

Here's an even scarier example than the OP's:

typedef int x;

int main() { x x = 5; return x; }  /* crazily enough this is legal C syntax and a well formed C program */
like image 132
jschultz410 Avatar answered Oct 05 '22 19:10

jschultz410


After introducing typedef

typedef int x;

the function has the following definition

void h(int( int ) ) {}

that is its parameter is declared as having type of function int( int ) that is adjusted to pointer to function.

You call the function supplying an integer:

h(42);

There is no implicit conversion from integer to function pointer.

I do not see a problem with

It seems the classic pipeline Lexer -> Parser -> Semantic Checker -> ... does not work here.

The parameter is substituted for the typedef.
x has a compiler attribute of a type. So it considers the record like

type-specifier h(type-specifier( type-name ) ) {}
like image 26
Vlad from Moscow Avatar answered Oct 05 '22 18:10

Vlad from Moscow