How is typecasting parsed by C compilers?

Question

Here it is syntactically impossible to tell whether f/g are function calls or typecasts without knowing how they are declared. Do compilers know the difference in the parse step, or do they usually resolve this in a second pass?

void f(int x){};
typedef short g;

int main(void){
   ((f)(1));
   ((g)(1));
   return 0;
}

Keith Thompson · Accepted Answer

Very early versions of C (before the first edition of K&R was published in 1978) did not have the typedef feature. In that version of C, a type name could always be recognized syntactically. int, float, char, struct, and so forth are keywords; other elements of a type name are punctuation symbols such as * and []. (Parsers can distinguish between keywords and identifiers that are not keywords, since there are only a small and fixed number of them.)

When typedef was added, it had to be shoehorned into the existing language. A typedef creates a new name for an existing type. That name is a single identifier -- which is not syntactically different from any other ordinary identifier.

A C compiler must maintain a symbol table as it parses its input. When it encounters an identifier, it needs to consult the symbol table to determine whether that it's a type name. Without that information, the grammar is ambiguous.

In a sense, a typedef declaration can be thought of as creating a new temporary keyword. But they're keywords that can be hidden by new declarations in inner scopes.

For example:

{
    typedef short g;
    /* g is now a type name, and the parser has
     * to treat it almost like a keyword
     */
    {
        int g;
        /* now g is an ordinary identifier as far as the parser is concerned */
    }
    /* And now g is a type name again */
}

Parsing C is hard.

user541686 · Answer

I think they do it lazily: whenever a token is parsed, the parsing of the next token is delayed until that symbol's semantic information is known. Then when the next token is parsed, the compiler already knows whether the symbol being referred to is a type name or not (it must have been declared earlier), and can act accordingly.
(So in this approach the semantic and syntactic analyses are intertwined and cannot be separated.)

How is typecasting parsed by C compilers?

Tags:

c

parsing

Andrew Johnson

2 Answers

Keith Thompson

user541686

Recent Activity

Donate For Us

How is typecasting parsed by C compilers?

Tags:

c

parsing

Andrew Johnson

2 Answers

Keith Thompson

user541686

Related questions

Recent Activity

Donate For Us