I'm trying to create a parser with Superpower. I have already taken a look to the samples I've found in the repo, but they are a bit difficult to understand, at least for a beginner like me :) So I came with this little challenge.
I have invented a very basic grammar just to learn. I have thought of an elevator that follows a list of instructions to go up, down and wait.
Example:
(UP 100),
(DOWN 200),
(DOWN 100),
(DOWN @1),
(UP @3),
(WAIT),
(UP 300)
As you see, it consists of a list of comma-separated verbs to move, for example, an elevator.
I really would like to learn how to create a token based parser for this grammar as a start in order to understand how to use SuperPower.
Step 1 of writing any Superpower parser is to figure out what the token kinds are. You have something like:
// ECL - Elevator Control Language ;-)
enum EclToken {
LParen,
RParen,
UpKeyword,
DownKeyword,
WaitKeyword,
AtSymbol,
Number,
Comma
}
Step 2, write a Tokenizer<EclToken>
. This is left as a direct programming task by Superpower v1 - there aren't many helpers to lean on, you just need to write the code as in the examples.
The tokenizer takes the input string, strips out the whitespace, and figures out what the sequence of tokens is.
For your example input, the first line will be:
// (UP 100),
LParen, UpKeyword, Number, RParen, Comma
For tokens like Number
that contain content, the span associated with the Result<EclToken>
will point to the portion of the input string corresponding to the token. In this line, the number will be a TextSpan
covering 100
.
Step 3 is to figure out what you want to parse the input into. For programming languages with nested expressions, this is usually an AST. In the case of the ECL sample, it's pretty simple so you might cut it down to:
struct ElevatorCommand {
public int Distance; // + or -
public bool IsRelative;
}
Step 4, the parser. This is usually embedded in a static class. The job of the parser is to build up more complex results (an ElevatorCommand[]
, here), from simpler results (number, movement).
This is where Superpower does the heavy lifting, particularly with respect to expectations and errors.
static class EclParser
{
static TokenListParser<EclToken, int> Number =
Token.EqualTo(EclToken.Number).Apply(Numerics.IntegerInt32);
}
The first thing we do is define the parser for numbers; this one applies a built-in TextParser<int>
to the content of an EclToken.Number
span.
You can see more parsing machinery in this example.
A few more clues to help you find the way (not syntax checked, let alone compiled/tested):
static TokenListParser<EclToken, ElevatorCommand> Up =
from _ in Token.EqualTo(EclToken.UpKeyword)
from distance in Number
select new ElevatorCommand {
Distance = distance,
IsRelative = false
};
static TokenListParser<EclToken, ElevatorCommand> Command =
from lp in Token.EqualTo(EclToken.LParen)
from command in Up // .Or(Down).Or(Wait)
from rp in Token.EqualTo(EclToken.RParen)
select command;
static TokenListParser<EclToken, ElevatorCommand[]> Commands =
Command.ManyDelimitedBy(Token.EqualTo(EclToken.Comma));
}
Commands
is the completed parser that you can apply to the input.
It's best to build up the parser incrementally, testing each smaller parser on chunks of input they're expected to parse.
OK, I have finally managed to get it. It wasn't so difficult with @Nicholas Blumhardt's guidance :)
I have created a project in GitHub to illustrate the scenario. Since the classes are big for a post, I'm linking to the files:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With