I am making a basic lexical analyser in Java
for my semester project and I am at conflict on a concept with my subject teacher.
My view is that in general if an input like "1a" is given to lexical analyser then it should give output as:
"<Number><Identifier>"
But, my teacher says that it should flag this as an error because instead of treating it as a number and a identifier it should flag the whole string(i.e. "1a") as an error.This is because(as he says) identifiers cannot start with a number.
On the contrary I think this should be the responsibility of next stage of compiler(syntax analyser) to decide if something is a valid identifier or not. I know he is right about identifiers not starting with a number but I need closure on the part that the lexical analyser should be the one deciding that.
I will really appreciate your help. Thank you
When the token pattern does not match the prefix of the remaining input, the lexical analyzer gets stuck and has to recover from this state to analyze the remaining input. In simple words, a lexical error occurs when a sequence of characters does not match the pattern of any token.
A lexical error is any input that can be rejected by the lexer. This generally results from token recognition falling off the end of the rules you've defined. For example (in no particular syntax): [0-9]+ ===> NUMBER token [a-zA-Z] ===> LETTERS token anything else ===> error!
Lexical phase errors These errors are detected during the lexical analysis phase. Typical lexical errors are: Exceeding length of identifier or numeric constants. The appearance of illegal characters.
A lexer contains tokenizer or scanner. If the lexical analyzer detects that the token is invalid, it generates an error. The role of Lexical Analyzer in compiler design is to read character streams from the source code, check for legal tokens, and pass the data to the syntax analyzer when it demands.
A lexical analyzer should be dealing with which kinds of tokens are legal or not and and dividing the text into tokens. It will error out if a string cannot form a valid token.
The syntax analyzer only deals with the structure of the program once the tokens have been determined. It will give an error if the tokens cannot be parsed according to the given grammar.
So your teacher is correct. Determining whether an identifier is legal falls under lexical analysis.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With