Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the preprocessor distinguish between number and character tokens?

According to the language specification, the lexical elements are defined like this:

token:
    keyword
    identifier

    constant

    string-literal
    operator
    punctuator

preprocessing-token:
    header-name
    identifier

    pp-number
    character-constant

    string-literal
    operator
    punctuator

    each non-white-space character that cannot be one of the above

Why is there a distinction between a number and a character on the preprocessing token level, whereas on the token level, there are only constants? I don't see the benefit in this distinction.

like image 517
fredoverflow Avatar asked Feb 25 '15 18:02

fredoverflow


1 Answers

The names of the non-terminals in the C grammars are not normative; they simply exist for purpose of description. It is only important that the behaviour be correctly described. The grammar itself is not sufficient to describe the language; it needs to be read along with the text, which imposes further restrictions on well-formed programs.

There is not a one-to-one relationship between preprocessor tokens and program tokens. There is overlap: a preprocessor identifier might be a keywords or it might be one of the various definable symbol types (including some constants and typedef-names). A pp-number might be an integer or floating constant, but it might also be invalid. The lexical productions are not all mutually exclusive, and the actual application of lexical category to a substring of the program requires procedures described in the standard text, and not in the formal grammar.

Character constants pass directly from the preprocessor into the program syntax without modification (although they are then subsumed into the constant category). If there is a single comment about preprocessor numbers (such as the fact that they must be convertible into a real numeric constant literal if they survive the preprocessor) is a sufficient reason to have the category.

Also, what would it add to include character-constant in the definition of pp-number? You still need both productions in order to describe the language.

like image 81
rici Avatar answered Oct 23 '22 12:10

rici