Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Struggling with Applicative parsing

So I'm having a go at writing a complex parser, using only Applicative (the parser in question doesn't even implement Monad at all).

For trivial parsers, this is quite easy. For non-trivial ones... not so much. The applicative interface seems to violently force you to write everything in point-free style. This is extremely difficult to deal with.

Consider, for example:

call = do
  n <- name
  char '('
  trim
  as <- sepBy argument (char ',' >> trim)
  char ')'
  trim
  char '='
  r <- result
  return $ Call {name = n, args = as, result = r}

Now let's try to write that using applicative:

call =
  (\ n _ _ as _ _ _ _ r -> Call {name = n, args = as, result = r}) <$>
  name <*>
  char '(' <*>
  trim <*>
  sepBy argument (const const () <$> char ',' <*> trim) <*>
  char ')' <*>
  trim <*>
  char '=' <*>
  trim <*>
  result

Applicative has forced me to put the variable bindings very far away from where the actual parser is. (E.g., try to confirm that as is actually bound to sepBy argument ...; it's not at all easy to verify that I haven't got the wrong count of _ patterns!)

Another very unintuitive thing is that <*> applies a function to a value, but *> and <* are just pure sequencing. This took for ages to wrap my mind around. Different method names would have made this far, far clearer. (But Monad seems to have grabbed >> and <<, sadly.) It seems that these can be stacked, yielding things like

exit =
  "EXIT:" *>
  trim *>
  name <*
  char '(' <*
  trim <*
  char ')' <*
  trim

It's rather non-obvious that you can do this. And, to me, this code really isn't terribly readable. More importantly, I still haven't figured out how you deal with collecting multiple values while dropping multiple other values.

In all, I find myself wishing I could just use do-notation! I don't actually need to change effects based on prior results; I don't need the power of Monad. But the notation is so much more readable. (I keep wondering whether it would actually be feasible to implement this; can you syntactically tell when a particular do-block can be mechanically transformed to applicative?)

Does anybody know of a way around these problems? Most particularly, how can I move the variable bindings closer to the parser they bind to?

like image 990
MathematicalOrchid Avatar asked Dec 16 '14 11:12

MathematicalOrchid


1 Answers

Well, your example parser is artificially complicated.

There are lots of occurrences of trim that you can abstract from:

token p = p <* trim

You can also abstract from something that occurs between a pair of matching parentheses:

parens p = token (char '(') *> p <* token (char ')')

Now what's left is:

call =
  (\ n as _ r -> Call {name = n, args = as, result = r}) <$>
  name <*>
  parens (sepBy argument (() <$ token (char ','))) <*>
  token (char '=') <*>
  result

Finally, you shouldn't count occurrences of _, but rather learn to use <$ and <*. Here are useful rules of thumb:

  • Only use *> in the combination foo *> p <* bar such as in parens above, nowhere else.

  • Make your parsers have the form f <$> p1 <*> ... <*> pn, and now choose between <$> and <$ in the first position or between <*> and <* in all other positions purely on the question of whether you're interested in the result of the subsequent parser. If you are, use the variant with the >, otherwise, use the one without. Then you never need to ignore any arguments in f, because you're not even getting access to them. In the reduced example case above, only the = token is left that we're not interested in, so we can say

    call = Call <$> name
                <*> parens (sepBy argument (() <$ token (char ',')))
                <*  token (char '=')
                <*> result
    

(This is assuming that Call actually takes only these three arguments.) I'd argue that this version is easier to read even than your original do-based version.

To answer your more general question: Yes, it's possible to recognize do-statements that don't need the power of monads. Simply put, they're the ones which are just a sequence of bindings with a return in the very end, and all of the bound variables are only used in the final return and nowhere else. There is a proposal to add this to GHC. (Personally, I'm not a huge fan of it, however. I think applicative notation is more functional than do-notation.)

like image 119
kosmikus Avatar answered Sep 28 '22 18:09

kosmikus