I am currently implementing a Lexer / Parser. And one thing that bugs me is that currently half of my code in my Parser.hs is going to be dedicated to simply getting a single token:
For a small data type like this:
data Tok
= IdLower String
| IdUpper String
| IdSymbol String
| IdColon String
| Equals
| Newline
I seem to need something like this:
idLower :: Parser String
idLower = get >>= \s -> if
| (_, IdLower n) :- xs <- s -> put xs *> pure n
| (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
| Nil l <- s -> throwError [(l, "Unexpected end of input")]
idUpper :: Parser String
idUpper = get >>= \s -> if
| (_, IdUpper n) :- xs <- s -> put xs *> pure n
| (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
| Nil l <- s -> throwError [(l, "Unexpected end of input")]
idSymbol :: Parser String
idSymbol = get >>= \s -> if
| (_, IdSymbol n) :- xs <- s -> put xs *> pure n
| (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
| Nil l <- s -> throwError [(l, "Unexpected end of input")]
idColon :: Parser String
idColon = get >>= \s -> if
| (_, IdColon n) :- xs <- s -> put xs *> pure n
| (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
| Nil l <- s -> throwError [(l, "Unexpected end of input")]
equals :: Parser ()
equals = get >>= \s -> if
| (_, Equals) :- xs <- s -> put xs
| (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
| Nil l <- s -> throwError [(l, "Unexpected end of input")]
newline :: Parser ()
newline = get >>= \s -> if
| (_, Newline) :- xs <- s -> put xs
| (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
| Nil l <- s -> throwError [(l, "Unexpected end of input")]
Which is like 99% repeated code, the only difference between them is the constructor used, and whether or not I have something like pure n
for the ones that have contents.
I have tried refactoring it a bit so that I have just one Tok -> Maybe ()
or Tok -> Maybe String
function per symbol, and then made a higher order function that takes these functions as parameters. But each Tok -> Maybe a
function takes at 3 lines plus 1 line of spacer, and now I need another higher order function to support it, and if I want shorthands so I can just use idLower
instead of getToken idLower
then I end up with just as much code total, if not more!
I am just really hoping there is an alternative to the above. Now I know I can reduce a bit of the duplication by perhaps creating an automatically failing function that will always call the relevant throwError
that I can defer to if the first guard doesn't hit, but even with that this still feels pretty gross.
You can get the Tok -> Maybe ()
and Tok -> Maybe String
functions "for free" (via Template Haskell) by using prisms (e.g. from the lens library).
data Tok =
IdLower String
| IdUpper String
| IdSymbol String
| IdColon String
| Equals
| Newline
makePrisms ''Tok
Now you can say:
GHCi> preview _IdLower (IdLower "foo")
Just "foo"
GHCi> preview _IdLower (IdUpper "Foo")
Nothing
Then, as you suggest yourself, you can abstract from the prism in your token-specific function:
tok :: Prism' Tok a -> Parser a
tok p = get >>= \ s -> if
| (_, t) :- xs <- s, Just n <- preview p t -> put xs *> pure n
| (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
| Nil l <- s -> throwError [(l, "Unexpected end of input")]
And then you can get the individual functions back by saying tok _IdLower
or tok _Equals
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With