Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C11 grammar ambiguity between _Atomic type specifier and qualifier

Tags:

c

grammar

yacc

c11

I'm trying to write a lex/yacc grammar for C11 based off of N1570. Most of my grammar is copied verbatim from the informative syntax summary, but some yacc conflicts arose. I've managed to resolve all of them except for one: there seems to be some ambiguity between when '_Atomic' is used as a type specifier and when it's used as a type qualifier.

In the specifier form, _Atomic is followed immediately by parentheses, so I'm assuming it has something to do with C's little-used syntax which allows declarators to be in parentheses, thus allowing parentheses to immediately follow a qualifier. But my grammar already knows how to differentiate typedef names from other identifiers, so yacc should know the difference, shouldn't it?

I can't for the life of me think of a case when it would actually be ambiguous.

I doubt it helps, but here's the relevant state output I get when I use yacc's -v flag. "ATOMIC" is obviously my token name for "_Atomic"

state 23

  152 atomic_type_specifier: ATOMIC . '(' type_name ')'
  156 type_qualifier: ATOMIC .

    '('  shift, and go to state 49

    '('       [reduce using rule 156 (type_qualifier)]
    $default  reduce using rule 156 (type_qualifier)
like image 331
jbatez Avatar asked May 19 '12 21:05

jbatez


2 Answers

Okay, whether or not we can come up with a grammatically ambiguous case doesn't matter. Section 6.7.2.4 paragraph 4 of N1570 states that:

If the _Atomic keyword is immediately followed by a left parenthesis, it is interpreted as a type specifier (with a type name), not as a type qualifier.

To enforce this, I simply made _Atomic as a specifier and _Atomic as a qualifier separate tokens in my lex rules.

"_Atomic"/{WHITESPACE}*"(" {return ATOMIC_SPECIFIER;}
"_Atomic"                  {return ATOMIC_QUALIFIER;}

I'm relatively new to lex/yacc and parser generators in general, but my gut says this is kind of a hack. At the same time, what else would the trailing context syntax in lex be for?

like image 197
jbatez Avatar answered Oct 04 '22 19:10

jbatez


Yes, I think there is ambiguity in the specification. Take

_Atomic int (*f)(int);

here the _Atomic is a type-qualifier. (As return type of a function it makes not much sense, but is valid, I think). Now take this alternative form

int _Atomic (*f)(int);

normally type-qualifiers can come after the int and this should be equivalent to the other declaration. But now _Atomic is followed by parenthesis, so it must be interpreted as a type-specifier which then is a syntax error. I think it would even be possible to cook up an example where *f could be replace by a valid typedef.

Have a look at the first phrase of 6.7.2.4 p4

The properties associated with atomic types are meaningful only for expressions that are lvalues.

This clearly indicates that they don't expect return types of functions to be _Atomic qualified.

Edit:

The same ambiguity would occur for

_Atomic int (*A)[3];

which makes perfect sense (a pointer to an array of three atomic integers) and which we should be able to rewrite as

int _Atomic (*A)[3];

Edit 2: To see that the criteria of having a type in the parenthesis is not disambiguating take the following valid C99 code:

typedef int toto;

int main(void) {
  const int toto(void);
  int const toto(void);
  const int (toto)(void);
  int const (toto)(void);
  return toto();
}

This redeclares toto inside main as a function. And all four lines are valid prototypes for the same function. Now use the _Atomic as a qualifier

typedef int toto;

int main(void) {
  int _Atomic (toto)(void);
  return toto();
}

this should be valid as the version with const. Now we have here a case where _Atomic is followed by parenthesis with a type inside, but yet it is not a type-specifier.

like image 35
Jens Gustedt Avatar answered Oct 04 '22 20:10

Jens Gustedt