for our compiler theory class, we are tasked with creating a simple interpreter for our own designed programming language. I am using jflex and cup as my generators but i'm a bit stuck with what a lexical error is. Also, is it recommended that i use the state feature of jflex? it feels wrong as it seems like the parser is better suited to handling that aspect. and do you recommend any other tools to create the language. I'm sorry if i'm impatient but it's due on tuesday.
Lexical errors are categorized under this type of error when a lexical item used in a sentence does not suit or collocate with another part of the sentence, these items sound unnatural or inappropriate.
The lexical errors found in these compositions have been counted and grouped into seven categories as follows; errors of wrong word choice, errors of literal translation, errors of omission or incompletion, misspelling, errors of redundancy, errors of collocation, and errors of word formation.
Error Recovery in Lexical AnalyzerRemoves one character from the remaining input. In the panic mode, the successive characters are always ignored until we reach a well-formed token. By inserting the missing character into the remaining input. Replace a character with another character.
Lexical error is a sequence of characters that does not match the pattern of any token. Lexical phase error is found during the execution of the program.
A lexical error is any input that can be rejected by the lexer. This generally results from token recognition falling off the end of the rules you've defined. For example (in no particular syntax):
[0-9]+ ===> NUMBER token
[a-zA-Z] ===> LETTERS token
anything else ===> error!
If you think about a lexer as a finite state machine that accepts valid input strings, then errors are going to be any input strings that do not result in that finite state machine reaching an accepting state.
The rest of your question was rather unclear to me. If you already have some tools you are using, then perhaps you're best to learn how to achieve what you want to achieve using those tools (I have no experience with either of the tools you mentioned).
EDIT: Having re-read your question, there's a second part I can answer. It is possible that a language could have no lexical errors - it's the language in which any input string at all is valid input.
A lexical error could be an invalid or unacceptable character by the language, like '@' which is rejected as a lexical error for identifiers in Java (it's reserved).
Lexical errors are the errors thrown by your lexer when unable to continue. Which means that there's no way to recognise a lexeme as a valid token for you lexer. Syntax errors, on the other side, will be thrown by your scanner when a given set of already recognised valid tokens don't match any of the right sides of your grammar rules.
it feels wrong as it seems like the parser is better suited to handling that aspect
No. It seems because context-free languages include regular languages (meaning than a parser can do the work of a lexer). But consider than a parser is a stack automata, and you will be employing extra computer resources (the stack) to recognise something that doesn't require a stack to be recognised (a regular expression). That would be a suboptimal solution.
NOTE: by regular expression, I mean... regular expression in the Chomsky Hierarchy sense, not a java.util.regex.*
class.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With