Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to write a backtracking concatenated parser in haskell

Tags:

haskell

parsec

I'm trying to get a parser created from concatenated parsers to backtrack in parsec.

here's the code:

ab = (try $ char 'a') <|> (try $ char 'b')
cd = (try $ char 'b') <|> (try $ char 'c')
abcd = (try $ many1 ab) >> (try cd)

here, when I run

parse abcd "parser"

with the input "aaaaaaaaaaaaaac", it works, but it errors on "aaaaaaaaaaaaab", which should be acceptable.

Any insights as to what I can do to get this to work

Thanks in advance

like image 416
sakekasi Avatar asked Feb 13 '23 06:02

sakekasi


1 Answers

So you're trying to write the regular expression (a|b)+(b|c) in Haskell? This is how you would do it:

import Text.Parsec.String
import Text.Parsec

ab :: Parser Char
ab = char 'a' <|> char 'b'

bc :: Parser Char
bc = char 'b' <|> char 'c'

abc :: Parser String
abc = do
    a <- ab
    b <- bc
    return [a,b]

abbc :: Parser String
abbc = try abc <|> do
    c <- ab
    s <- abbc
    return (c:s)

parser :: String -> Either ParseError String
parser = parse abbc "(a|b)+(b|c)"

Now load up ghci and run the code as follows:

aaditmshah@home:~$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> :load Test.hs
[1 of 1] Compiling Main             ( Test.hs, interpreted )
Ok, modules loaded: Main.
*Main> parser "aaaaaaaaaaaaaac"
Loading package array-0.4.0.1 ... linking ... done.
Loading package deepseq-1.3.0.1 ... linking ... done.
Loading package bytestring-0.10.0.2 ... linking ... done.
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package mtl-2.1.2 ... linking ... done.
Loading package text-0.11.3.1 ... linking ... done.
Loading package parsec-3.1.5 ... linking ... done.
Right "aaaaaaaaaaaaaac"
*Main> parser "aaaaaaaaaaaaaab"
Right "aaaaaaaaaaaaaab"
*Main> 
Leaving GHCi.

How does this work? Let's understand it parser by parser:

ab :: Parser Char
ab = char 'a' <|> char 'b'

bc :: Parser Char
bc = char 'b' <|> char 'c'

These two parsers are simple. They correspond to the regular expressions (a|b) and (b|c) respectively.

abc :: Parser String
abc = do
    a <- ab
    b <- bc
    return [a,b]

This is also relatively simple. It corresponds to the regular expression (a|b)(b|c).

abbc :: Parser String
abbc = try abc <|> do
    c <- ab
    s <- abbc
    return (c:s)

This is where things get complicated. This parser does the following:

  1. It tries to match the input with the regular expression (a|b)(b|c).
  2. If it fails then it doesn't consume any input. Instead it matches the first character of input with (a|b) and then goes back to step 1 to match the rest of the input.

Hence it corresponds to the regular expression (a|b)*(a|b)(b|c) which is the same as (a|b)+(b|c).

The try combinator is important. Without it the parser will try to match the input with (a|b)(b|c). Failing to do so it will throw an error instead of backtracking and trying the alternate condition.

like image 168
Aadit M Shah Avatar answered Apr 01 '23 18:04

Aadit M Shah