Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does a C parser distinguish between a type cast and a function call in general?

I'm trying to write a C parser, for my own education. I know that I could use tools like YACC to simplify the process, but I want to learn as much as possible from the experience, so I'm starting from scratch.

My question is how I should handle a line like this:

doSomethingWith((foo)(bar));

It could be that (foo)(bar) is a type cast, as in:

typedef int foo;

void doSomethingWith(foo aFoo) { ... }

int main() {
    float bar = 23.6;

    doSomethingWith((foo)(bar));

    return 0;
}

Or, it could be that (foo)(bar) is a function call, as in:

int foo(int bar) { return bar; }

void doSomethingWith(int anInt) { ... }

int main() {
    int bar = 10;

    doSomethingWith((foo)(bar));

    return 0;
}

It seems to me that the parser cannot determine which of the two cases it is dealing with solely by looking at the line doSomethingWith((foo)(bar)); This annoys me, because I was hoping to be able to separate the parsing stage from the "interpretation" stage where you actually determine that the line typedef int foo; means that foo is now a valid type. In my imagined scenario, Type a = b + c * d would parse just fine, even if Type, a, b, c, and d aren't defined anywhere, and problems would only arise later, when actually trying to "resolve" the identifiers.

So, my question is: how do "real" C parsers deal with this? Is the separation between the two stages that I was hoping for just a naive wish, or am I missing something?

like image 928
Ord Avatar asked Sep 07 '13 20:09

Ord


Video Answer


1 Answers

Historically, typedefs were a relatively late addition to C. Before they were added to the language, type names consisted of keywords (int, char, double, struct, etc.) and punctuation characters (*, [], ()), and so were easy to recognize unambiguously. An identifier could never be a type name, so an identifier in parentheses followed by an expression could not be a cast expression.

Typedefs made it possible for a user-defined identifier to be a type name, which rather seriously messed up the grammar.

Take a look at the syntax of type-specifier in the C standard (I'll use the C90 version since it's slightly simpler):

type-specifier:
void
char
short
int
long
float
double
signed
unsigned
struct-or-union-specifier
enum-specifier
typedef-name

All but the last can be easily recognized because they either are keywords, or start with a keyword. But a typedef-name is just an identifier.

When a C compiler processes a typedef declaration, it needs to, in effect, introduce the typedef name as a new keyword. Which means that, unlike for a language with a context-free grammar, there needs to be feedback from the symbol table to the parser.

And even that's a bit of an oversimplification. A typedef name can still be redefined, either as another typedef or as something else, in an inner scope:

{
    typedef int foo; /* foo is a typedef name */
    {
        int foo;     /* foo is now an ordinary identifier, an object name */
    }
                     /* And now foo is a typedef name again */
}

So a typedef name is effectively a user-defined keyword if it's used in a context where a type name is valid, but is still an ordinary identifier if it's redeclared.

TL;DR: Parsing C is hard.

like image 181
Keith Thompson Avatar answered Sep 28 '22 05:09

Keith Thompson