Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsec: error message at specific location

Tags:

haskell

parsec

Using Parsec how does one indicate an error at a specific position if a semantic rule is violated. I know typically we don't want to do such things, but consider the example grammar.

<foo> ::= <bar> | ...
<bar> ::= a positive integer power of two

The <bar> rule is a finite set (my example is arbitrary), and a pure approach to the above could be a careful application of the choice combinator, but this might be impractical in space and time. In recursive descent or toolkit-generated parsers the standard trick is to parse an integer (a more relaxed grammar) and then semantically check the harder constraints. For Parsec, I could use a natural parser and check the result calling fail when that doesn't match or unexpected or whatever. But if we do that, the default error location is the wrong one. Somehow I need to raise the error at the earlier state.

I tried a brute force solution and wrote a combinator that uses getPosition and setPosition as illustrated by this very similar question. Of course, I was also unsuccessful (the error location is, of course wrong). I've run into this pattern many times. I am kind of looking for this type of combinator:

withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred lbl p = do
  ok <- lookAhead $ fmap pred (try p) <|> return False -- peek ahead
  if ok then p         -- consume the input if the value passed the predicate
   else fail lbl       -- otherwise raise the error at the *start* of this token

pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
  where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])

The above does not work. (I tried variants on this as well.) Somehow the parser backtracks a says it's expecting a digit. I assume it's returning the error that made it the furthest. Even {get,set}ParserState fails erase that memory.

Am I handling this syntactic pattern wrong? How would all you Parsec users approach these type of problems?

Thanks!

like image 386
Tim Avatar asked May 30 '14 22:05

Tim


People also ask

How do I fix error code 14003 Parsec?

If you get error -14003, turn off WDDM graphics in favor of XDDM: Run "Edit group policy" Navigate to Local Computer Policy > Computer Configuration > Administrative Templates > Windows Components > Remote Desktop Services > Remote Desktop Session Host > Remote Session Environment.

Why is Parsec not working?

Parsec needs a display to be able to capture it, and it must be on currently. Check that computer physically and make sure that the screen is not turned off. If your laptop's tray is closed for example, the screen is automatically turned off and Parsec will not have a display.


3 Answers

I think both your ideas are OK. The other two answers deal with Parsec, but I'd like to note that in both cases Megaparsec just does the right thing:

{-# LANGUAGE TypeApplications #-}

module Main (main) where

import Control.Monad
import Data.Void
import Text.Megaparsec
import qualified Text.Megaparsec.Char.Lexer as L

type Parser = Parsec Void String

withPredicate1 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate1 f msg p = do
  r <- lookAhead p
  if f r
    then p
    else fail msg

withPredicate2 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate2 f msg p = do
  mpos <- getNextTokenPosition -- †
  r    <- p
  if f r
    then return r
    else do
      forM_ mpos setPosition
      fail msg

main :: IO ()
main = do
  let msg = "I only like numbers greater than 42!"
  parseTest' (withPredicate1 @Integer (> 42) msg L.decimal) "11"
  parseTest' (withPredicate2 @Integer (> 42) msg L.decimal) "22"

If I run it:

The next big Haskell project is about to start!
λ> :main
1:1:
  |
1 | 11
  | ^
I only like numbers greater than 42!
1:1:
  |
1 | 22
  | ^
I only like numbers greater than 42!
λ>

Try it for yourself! Works as expected.


getNextTokenPosition is more correct than getPosition for streams where tokens contain position of their beginning and end in themselves. This may or may not be important in your case.

like image 117
Mark Karpov Avatar answered Sep 22 '22 02:09

Mark Karpov


It's not a solution I like, but you can hypnotize Parsec into believing it's had a single failure with consumption:

failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))

Here's a complete example:

import Control.Monad
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.Error
import Text.Parsec.Prim
import Debug.Trace

failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))

type P a = Parsec String () a

withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred msg p = do
    pos <- getPosition
    x <- p
    unless (pred x) $ failAt pos msg
    return x

natural = read <$> many1 digit
pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
  where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])

main = print $ runParser pPowerOfTwo  () "myinput" "4095"

When run, it results in:

Left "myinput" (line 1, column 1):
expecting power of two
like image 33
that other guy Avatar answered Sep 22 '22 02:09

that other guy


I think the problem stems from how Parsec picks the "best error" in the non-deterministic setting. See Text.Parsec.Error.mergeError. Specifically, this selects the longest match when choosing which error is the error to report. I think we need some way to make Parsec order errors differently, which may be too obscure for us solving this problem.

In my case, I here's how I worked around the problem:

I solved stacked an Exception monad within my ParsecT type.

type P m = P.ParsecT String ParSt (ExceptT Diagnostic m)

Then I introduced a pair of combinators: (Note: Loc is my internal location type)

-- stops hard on an error (no backtracking)
-- which is why I say "semantic" instead of "syntax" error
throwSemanticError :: (MonadTrans t, Monad m) => Loc -> String -> t (ExceptT Diagnostic m) a
throwSemanticError loc msg = throwSemanticErrorDiag $! Diagnostic loc msg


withLoc :: Monad m => (Loc -> P m a) -> P m a
withLoc pa = getLoc >>= pa

Now in parsing I can write:

parsePrimeNumber = withLoc $ \loc ->
  i <- parseInt
  unless (isPrime i) $ throwSemanticError loc "number is not prime!"
  return i

The top level interface to run one of these monads is really nasty.

runP :: Monad m
    => ParseOpts
    -> P m a
    -> String
    -> m (ParseResult a)
runP pos pma inp = 
  case runExceptT (P.runParserT pma (initPSt pos) "" inp) of
    mea -> do
             ea <- mea
             case ea of
               -- semantic error (throwSemanticError)
               Left err -> return $! PError err
               -- regular parse error
               Right (Left err) -> return $ PError (errToDiag err)
               -- success
               Right (Right a) -> return (PSuccess a [])

I'm not terribly happy with this solution and desire something better.

I wish parsec had a:

semanticCheck :: (a -> Parsec Bool) -> Parsec a -> Parsec a
semanticCheck pred p = 
    a <- p
    z <- pred a
    unless z $
       ... somehow raise the error from the beginning of this token/parse 
       rather than the end ... and when propagating the error up, 
      use the end parse position, so this parse error beats out other 
      failed parsers that make it past the beginning of this token 
      (but not to the end)
    return a
like image 38
Tim Avatar answered Sep 22 '22 02:09

Tim