I am making a lexer, don't tell me to not do because I already did most of it.
Currently it makes an array of tokens and that's it.
I would like to know, what functions the lexer needs to provide and a brief explanation of what each function needs to do.
I'll accept the most complete list.
An example function would be:
next: Consume the current token and return it
Also, should the lexer have the expect function or should the interpreter implement it?
By the way, the lexer constructor accepts a string as argument and make the lexical analyses and store all the tokens in the "tokens" variable.
The language is javascript, so I can't overload operators.
In my experience, you need:
nextToken — move forward in the input and get the next token.curToken — return the current token; don't movecurValue — tokens like STRING and NUMBER have values; tokens like SEMICOLON don'tsourcePos — return the source position (line number, character position) of the first character of the current tokenedit — oh also:
prefetch — initialize the lexer by getting the first token.Additionally, for some languages you might want 2 or more tokens of lookahead. Then you'd want a variation on plain curToken so that you can look at a bigger "window" on the token stream. For most languages that's not really necessary however.
edit again — also I won't tell you not to write one because they're basically the funnest things ever. In javascript you can't get too crazy, but in a language like Erlang you can have your lexer act like a "token pump" by making it generate a stream of tokens it sends to a separate parser process.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With