Using Parsec how does one indicate an error at a specific position if a semantic rule is violated. I know typically we don't want to do such things, but consider the example grammar. <pre class="prettyprint"><code><foo> ::= <bar> | ... <bar> ::= a positive integer power of two </code></pre> The <code><bar></code> rule is a finite set (my example is arbitrary), and a pure approach to the above could be a careful application of the <code>choice</code> combinator, but this might be impractical in space and time. In recursive descent or toolkit-generated parsers the standard trick is to parse an integer (a more relaxed grammar) and then semantically check the harder constraints. For Parsec, I could use a <code>natural</code> parser and check the result calling <code>fail</code> when that doesn't match or <code>unexpected</code> or whatever. But if we do that, the default error location is the wrong one. Somehow I need to raise the error at the earlier state. I tried a brute force solution and wrote a combinator that uses <code>getPosition</code> and <code>setPosition</code> as illustrated by this very similar question. Of course, I was also unsuccessful (the error location is, of course wrong). I've run into this pattern many times. I am kind of looking for this type of combinator: <pre class="prettyprint"><code>withPredicate :: (a -> Bool) -> String -> P a -> P a withPredicate pred lbl p = do ok <- lookAhead $ fmap pred (try p) <|> return False -- peek ahead if ok then p -- consume the input if the value passed the predicate else fail lbl -- otherwise raise the error at the *start* of this token pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural where isPowerOfTwo = (`elem` [2^i | i<-[1..20]]) </code></pre> The above does not work. (I tried variants on this as well.) Somehow the parser backtracks a says it's expecting a digit. I assume it's returning the error that made it the furthest. Even <code>{get,set}ParserState</code> fails erase that memory. Am I handling this syntactic pattern wrong? How would all you Parsec users approach these type of problems? Thanks!

I think the problem stems from how Parsec picks the "best error" in the non-deterministic setting. See Text.Parsec.Error.mergeError. Specifically, this selects the longest match when choosing which error is the error to report. I think we need some way to make Parsec order errors differently, which may be too obscure for us solving this problem. In my case, I here's how I worked around the problem: I solved stacked an Exception monad within my ParsecT type. <pre class="prettyprint"><code>type P m = P.ParsecT String ParSt (ExceptT Diagnostic m) </code></pre> Then I introduced a pair of combinators: (Note: <code>Loc</code> is my internal location type) <pre class="prettyprint"><code>-- stops hard on an error (no backtracking) -- which is why I say "semantic" instead of "syntax" error throwSemanticError :: (MonadTrans t, Monad m) => Loc -> String -> t (ExceptT Diagnostic m) a throwSemanticError loc msg = throwSemanticErrorDiag $! Diagnostic loc msg withLoc :: Monad m => (Loc -> P m a) -> P m a withLoc pa = getLoc >>= pa </code></pre> Now in parsing I can write: <pre class="prettyprint"><code>parsePrimeNumber = withLoc $ \loc -> i <- parseInt unless (isPrime i) $ throwSemanticError loc "number is not prime!" return i </code></pre> The top level interface to run one of these monads is really nasty. <pre class="prettyprint"><code>runP :: Monad m => ParseOpts -> P m a -> String -> m (ParseResult a) runP pos pma inp = case runExceptT (P.runParserT pma (initPSt pos) "" inp) of mea -> do ea <- mea case ea of -- semantic error (throwSemanticError) Left err -> return $! PError err -- regular parse error Right (Left err) -> return $ PError (errToDiag err) -- success Right (Right a) -> return (PSuccess a []) </code></pre> I'm not terribly happy with this solution and desire something better. I wish parsec had a: <pre class="prettyprint"><code>semanticCheck :: (a -> Parsec Bool) -> Parsec a -> Parsec a semanticCheck pred p = a <- p z <- pred a unless z $ ... somehow raise the error from the beginning of this token/parse rather than the end ... and when propagating the error up, use the end parse position, so this parse error beats out other failed parsers that make it past the beginning of this token (but not to the end) return a </code></pre>

Parsec: error message at specific location

Tags:

haskell

parsec

Using Parsec how does one indicate an error at a specific position if a semantic rule is violated. I know typically we don't want to do such things, but consider the example grammar.

<foo> ::= <bar> | ...
<bar> ::= a positive integer power of two

The <bar> rule is a finite set (my example is arbitrary), and a pure approach to the above could be a careful application of the choice combinator, but this might be impractical in space and time. In recursive descent or toolkit-generated parsers the standard trick is to parse an integer (a more relaxed grammar) and then semantically check the harder constraints. For Parsec, I could use a natural parser and check the result calling fail when that doesn't match or unexpected or whatever. But if we do that, the default error location is the wrong one. Somehow I need to raise the error at the earlier state.

I tried a brute force solution and wrote a combinator that uses getPosition and setPosition as illustrated by this very similar question. Of course, I was also unsuccessful (the error location is, of course wrong). I've run into this pattern many times. I am kind of looking for this type of combinator:

withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred lbl p = do
  ok <- lookAhead $ fmap pred (try p) <|> return False -- peek ahead
  if ok then p         -- consume the input if the value passed the predicate
   else fail lbl       -- otherwise raise the error at the *start* of this token

pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
  where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])

The above does not work. (I tried variants on this as well.) Somehow the parser backtracks a says it's expecting a digit. I assume it's returning the error that made it the furthest. Even {get,set}ParserState fails erase that memory.

Am I handling this syntactic pattern wrong? How would all you Parsec users approach these type of problems?

Thanks!

386

asked May 30 '14 22:05

Tim

3 Answers

I think both your ideas are OK. The other two answers deal with Parsec, but I'd like to note that in both cases Megaparsec just does the right thing:

{-# LANGUAGE TypeApplications #-}

module Main (main) where

import Control.Monad
import Data.Void
import Text.Megaparsec
import qualified Text.Megaparsec.Char.Lexer as L

type Parser = Parsec Void String

withPredicate1 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate1 f msg p = do
  r <- lookAhead p
  if f r
    then p
    else fail msg

withPredicate2 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate2 f msg p = do
  mpos <- getNextTokenPosition -- †
  r    <- p
  if f r
    then return r
    else do
      forM_ mpos setPosition
      fail msg

main :: IO ()
main = do
  let msg = "I only like numbers greater than 42!"
  parseTest' (withPredicate1 @Integer (> 42) msg L.decimal) "11"
  parseTest' (withPredicate2 @Integer (> 42) msg L.decimal) "22"

If I run it:

The next big Haskell project is about to start!
λ> :main
1:1:
  |
1 | 11
  | ^
I only like numbers greater than 42!
1:1:
  |
1 | 22
  | ^
I only like numbers greater than 42!
λ>

Try it for yourself! Works as expected.

† getNextTokenPosition is more correct than getPosition for streams where tokens contain position of their beginning and end in themselves. This may or may not be important in your case.

117

answered Sep 22 '22 02:09

Mark Karpov

It's not a solution I like, but you can hypnotize Parsec into believing it's had a single failure with consumption:

failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))

Here's a complete example:

import Control.Monad
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.Error
import Text.Parsec.Prim
import Debug.Trace

failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))

type P a = Parsec String () a

withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred msg p = do
    pos <- getPosition
    x <- p
    unless (pred x) $ failAt pos msg
    return x

natural = read <$> many1 digit
pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
  where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])

main = print $ runParser pPowerOfTwo  () "myinput" "4095"

When run, it results in:

Left "myinput" (line 1, column 1):
expecting power of two

answered Sep 22 '22 02:09

that other guy

I think the problem stems from how Parsec picks the "best error" in the non-deterministic setting. See Text.Parsec.Error.mergeError. Specifically, this selects the longest match when choosing which error is the error to report. I think we need some way to make Parsec order errors differently, which may be too obscure for us solving this problem.

In my case, I here's how I worked around the problem:

I solved stacked an Exception monad within my ParsecT type.

type P m = P.ParsecT String ParSt (ExceptT Diagnostic m)

Then I introduced a pair of combinators: (Note: Loc is my internal location type)

-- stops hard on an error (no backtracking)
-- which is why I say "semantic" instead of "syntax" error
throwSemanticError :: (MonadTrans t, Monad m) => Loc -> String -> t (ExceptT Diagnostic m) a
throwSemanticError loc msg = throwSemanticErrorDiag $! Diagnostic loc msg


withLoc :: Monad m => (Loc -> P m a) -> P m a
withLoc pa = getLoc >>= pa

Now in parsing I can write:

parsePrimeNumber = withLoc $ \loc ->
  i <- parseInt
  unless (isPrime i) $ throwSemanticError loc "number is not prime!"
  return i

The top level interface to run one of these monads is really nasty.

runP :: Monad m
    => ParseOpts
    -> P m a
    -> String
    -> m (ParseResult a)
runP pos pma inp = 
  case runExceptT (P.runParserT pma (initPSt pos) "" inp) of
    mea -> do
             ea <- mea
             case ea of
               -- semantic error (throwSemanticError)
               Left err -> return $! PError err
               -- regular parse error
               Right (Left err) -> return $ PError (errToDiag err)
               -- success
               Right (Right a) -> return (PSuccess a [])

I'm not terribly happy with this solution and desire something better.

I wish parsec had a:

semanticCheck :: (a -> Parsec Bool) -> Parsec a -> Parsec a
semanticCheck pred p = 
    a <- p
    z <- pred a
    unless z $
       ... somehow raise the error from the beginning of this token/parse 
       rather than the end ... and when propagating the error up, 
      use the end parse position, so this parse error beats out other 
      failed parsers that make it past the beginning of this token 
      (but not to the end)
    return a

answered Sep 22 '22 02:09

Tim

Related questions
                            
                                How to exit main in haskell given a condition
                            
                                Haskell data structure with optimized access methods
                            
                                Type classes and records as interfaces
                            
                                Solving Google Code Jam's "Minimum Scalar Product" in Haskell
                            
                                Haskell Debugging
                            
                                When can eta reduction change a function's type?
                            
                                Source code for standard typeclass instance declarations
                            
                                Libraries for sequential non-linear optimization in haskel?
                            
                                Which mathematical topics are most crucial for a deep understanding of Haskell?
                            
                                How to implement ^ operator in Haskell?
                            
                                Insert specific element y after every n elements in a list
                            
                                Dot operator in haskell with multi-parameter functions
                            
                                Why the grave accent is used when passing the mod function to map?
                            
                                How to have multiple infinite ranges in list comprehensions?
                            
                                Ravi Sethi's Little Quilt Language in Haskell
                            
                                Cannot enter multiline statements in GHCi [duplicate]
                            
                                % doesn't work when editing Haskell code in vim?
                            
                                Is it possible for pure functions in Haskell to mutate local copies of variables?
                            
                                Avoiding case expression ladder without monad transformers
                            
                                foldl Implementation with Runtime Errors

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parsec: error message at specific location

Tags:

haskell

parsec

Tim

People also ask

3 Answers

Mark Karpov

that other guy

Tim

Recent Activity

Donate For Us