Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid excessive boilerplate for matching constructors

I am currently implementing a Lexer / Parser. And one thing that bugs me is that currently half of my code in my Parser.hs is going to be dedicated to simply getting a single token:

For a small data type like this:

data Tok
    = IdLower String 
    | IdUpper String 
    | IdSymbol String
    | IdColon String
    | Equals
    | Newline

I seem to need something like this:

idLower :: Parser String
idLower = get >>= \s -> if
    | (_, IdLower n) :- xs <- s -> put xs *> pure n
    | (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
    | Nil l <- s -> throwError [(l, "Unexpected end of input")]

idUpper :: Parser String
idUpper = get >>= \s -> if
    | (_, IdUpper n) :- xs <- s -> put xs *> pure n
    | (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
    | Nil l <- s -> throwError [(l, "Unexpected end of input")]

idSymbol :: Parser String
idSymbol = get >>= \s -> if
    | (_, IdSymbol n) :- xs <- s -> put xs *> pure n
    | (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
    | Nil l <- s -> throwError [(l, "Unexpected end of input")]

idColon :: Parser String
idColon = get >>= \s -> if
    | (_, IdColon n) :- xs <- s -> put xs *> pure n
    | (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
    | Nil l <- s -> throwError [(l, "Unexpected end of input")]

equals :: Parser ()
equals = get >>= \s -> if
    | (_, Equals) :- xs <- s -> put xs
    | (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
    | Nil l <- s -> throwError [(l, "Unexpected end of input")]

newline :: Parser ()
newline = get >>= \s -> if
    | (_, Newline) :- xs <- s -> put xs
    | (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
    | Nil l <- s -> throwError [(l, "Unexpected end of input")]

Which is like 99% repeated code, the only difference between them is the constructor used, and whether or not I have something like pure n for the ones that have contents.

I have tried refactoring it a bit so that I have just one Tok -> Maybe () or Tok -> Maybe String function per symbol, and then made a higher order function that takes these functions as parameters. But each Tok -> Maybe a function takes at 3 lines plus 1 line of spacer, and now I need another higher order function to support it, and if I want shorthands so I can just use idLower instead of getToken idLower then I end up with just as much code total, if not more!

I am just really hoping there is an alternative to the above. Now I know I can reduce a bit of the duplication by perhaps creating an automatically failing function that will always call the relevant throwError that I can defer to if the first guard doesn't hit, but even with that this still feels pretty gross.

like image 901
semicolon Avatar asked Jan 05 '23 11:01

semicolon


1 Answers

You can get the Tok -> Maybe () and Tok -> Maybe String functions "for free" (via Template Haskell) by using prisms (e.g. from the lens library).

data Tok =
    IdLower String
  | IdUpper String
  | IdSymbol String
  | IdColon String
  | Equals
  | Newline

makePrisms ''Tok

Now you can say:

GHCi> preview _IdLower (IdLower "foo")
Just "foo"
GHCi> preview _IdLower (IdUpper "Foo")
Nothing

Then, as you suggest yourself, you can abstract from the prism in your token-specific function:

tok :: Prism' Tok a -> Parser a
tok p = get >>= \ s -> if
  | (_, t) :- xs <- s, Just n <- preview p t -> put xs *> pure n
  | (l, t) :- _ <- s -> throwError [(l, "Unexpected " <> description t)]
  | Nil l <- s -> throwError [(l, "Unexpected end of input")]

And then you can get the individual functions back by saying tok _IdLower or tok _Equals.

like image 69
kosmikus Avatar answered May 23 '23 10:05

kosmikus