In compiler construction, when you talk about tokens, is a token the same like a symbol / just another term for a symbol? After some research I think to understand, that a token is a symbol with a reference to the symbol table, therefore some kind of attributed symbol / a symbol with some additional informations? Thanks for any clearfication :-)
Symbol table is an important data structure used in a compiler. Symbol table is used to store the information about the occurrence of various entities such as objects, classes, variable name, interface, function name etc. it is used by both the analysis and synthesis phases.
Token: A token is a group of characters having collective meaning: typically a word or punctuation mark, separated by a lexical analyzer and passed to a parser. A lexeme is an actual character sequence forming a specific instance of a token, such as num.
Tokens are a set of strings used in a programming language. Terminals are a set of characters used in production rules. 2. The compiler breaks a program into the smallest units known as tokens which are passed through various stages of the compiler.
A Lexeme is a string of characters that is a lowest-level syntatic unit in the programming language. These are the "words" and punctuation of the programming language. A Token is a syntactic category that forms a class of lexemes. These are the "nouns", "verbs", and other parts of speech for the programming language.
A token is not necessarily a symbol in the symbol table. For example, if a token is a reserved word, then it is not entered in the symbol table. If a token is an identifier, then it will likely be entered in the symbol table.
Take for example the following declaration:
char s[100];
A lexical analyzer could output the following tokens:
<"char", IDENTIFIER>
depending on the implementation it could be recognized as a reserved word or be entered in the symbol table as a predefined type name (I am not 100% sure here),
<"s", IDENTIFIER>
"s" is entered in symbol table as a variable identifier,
<"[", OPEN_SQUARE_BRACKET>
not entered in symbol table,
<"100", INTEGER_LITERAL>
not entered in symbol table,
<"]", CLOSE_SQUARE_BRACKET>
not entered in symbol table,
<";", SEMI_COLON>
not entered in symbol table.
So you basically enter in the symbol table only those tokens that you need to reference later during the compilation process. E.g., later in the function body, when you find
strcpy(s, "Hello, world\n");
you recognize again the token <"s", IDENTIFIER> and look it up in the symbol table. The symbol table will say that "s" has been declared as a variable of type char [].
So , I would say a token is any chunk of input that is recognized by the lexical analizer, and that only certain tokens with a special meaning are entered as symbols in the symbol table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With