Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is typecasting parsed by C compilers?

Tags:

c

parsing

Here it is syntactically impossible to tell whether f/g are function calls or typecasts without knowing how they are declared. Do compilers know the difference in the parse step, or do they usually resolve this in a second pass?

void f(int x){};
typedef short g;

int main(void){
   ((f)(1));
   ((g)(1));
   return 0;
}
like image 598
Andrew Johnson Avatar asked Jul 21 '14 23:07

Andrew Johnson


2 Answers

Very early versions of C (before the first edition of K&R was published in 1978) did not have the typedef feature. In that version of C, a type name could always be recognized syntactically. int, float, char, struct, and so forth are keywords; other elements of a type name are punctuation symbols such as * and []. (Parsers can distinguish between keywords and identifiers that are not keywords, since there are only a small and fixed number of them.)

When typedef was added, it had to be shoehorned into the existing language. A typedef creates a new name for an existing type. That name is a single identifier -- which is not syntactically different from any other ordinary identifier.

A C compiler must maintain a symbol table as it parses its input. When it encounters an identifier, it needs to consult the symbol table to determine whether that it's a type name. Without that information, the grammar is ambiguous.

In a sense, a typedef declaration can be thought of as creating a new temporary keyword. But they're keywords that can be hidden by new declarations in inner scopes.

For example:

{
    typedef short g;
    /* g is now a type name, and the parser has
     * to treat it almost like a keyword
     */
    {
        int g;
        /* now g is an ordinary identifier as far as the parser is concerned */
    }
    /* And now g is a type name again */
}

Parsing C is hard.

like image 57
Keith Thompson Avatar answered Nov 02 '22 08:11

Keith Thompson


I think they do it lazily: whenever a token is parsed, the parsing of the next token is delayed until that symbol's semantic information is known. Then when the next token is parsed, the compiler already knows whether the symbol being referred to is a type name or not (it must have been declared earlier), and can act accordingly.
(So in this approach the semantic and syntactic analyses are intertwined and cannot be separated.)

like image 44
user541686 Avatar answered Nov 02 '22 10:11

user541686