Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell parsec parsing a string of items

Tags:

haskell

parsec

I have a list that I need to parse where the all but the last element needs to be parsed by one parser, and the last element needs to be parsed by another parser.

a = "p1 p1b ... p2"
or
a = "p2"

Originally I tried

parser = do parse1 <- many parser1
            parse2 <- parser2
            return AParse parse1 parse2

The problem is that parse1 can consume a parse2 input. So parse1 always consumes the entire list, and leave parse2 with nothing.

Is there a way to say to apply parse1 to everything besides the last element in a string, and then apply parse2?

like image 214
Chris Avatar asked Mar 15 '10 16:03

Chris


2 Answers

How about:

parseTrain car caboose = choice
    [ fmap (:[]) $ try (caboose `endBy` eof), 
    , liftM2 (:) car (parseTrain car caboose) 
    [

The eof bugs me, since that makes this parser not compositional. I.e. you couldn't say:

char '(' >> parseTrain p1 p2 >> char ')'

Doing this compsitionally is very hard for a parser. How is it supposed to know to move on to char ')', without trying to at every opportunity and seeing if it fails? Doing so could exponential time.

If you need it to be compositional, does your problem have some additional structure you can exploit? Can you, for example, parse a list of all elements and then process the last one after the fact?

like image 107
luqui Avatar answered Nov 08 '22 22:11

luqui


If you can factor parser1 so that is defined like so:

parser1 = (try parser2) <|> parser1extra

Then the problem becomes a list of parser1extra or parser2 that must end in the later. You can code that as:

parserList =
    liftM2 (:) (try parser1extra) parserList
    <|>
    liftM2 (:) (try parser2) (option [] parserList)

You may or may not need the try calls depending on if those parsers have any prefix overlap.

If you don't want the return value to be a list, but instead your AParse datum, then you could re-write it this way:

parserList =
    do
        a <- try parser1extra
        prefix a parserList
    <|>
    do
        a <- try parser2
        option (AParse [] a) (prefix a parserList)

    where prefix a p = do
            (AParse as t) <- p
            return $ (AParse (a:as) t)

Or, a full example:

import Control.Monad
import Text.ParserCombinators.Parsec

parseNum = do { v <- many1 digit; spaces; return v }
parseWord = do { v <- many1 letter; spaces; return v }
parsePart = parseNum <|> parseWord

parsePartListEndingInWord =
    liftM2 (:) (try parseNum) parsePartListEndingInWord
    <|>
    liftM2 (:) (try parseWord) (option [] parsePartListEndingInWord)

Actually, the calls to try aren't needed in this case, as parseNum and parseWord match no common prefix. Notice that parsePartListEndingInWord doesn't actually reference parsePart, but instead, the two options that make up parsePart's definition


(Original answer, solving a somewhat different situation:)

How about something like:

parserTest = between (char '[') (char ']') $ do
    p1s <- try parser1 `endBy` char ',' 
    p2 <- parser2
    return $ AParse p1s p2

Taking the punctuation out of your parsers and up into parseTest allows you to use the combinators between and endBy to do the work for you. Lastly, the try is there so that if parser1 and parser2 match a common prefix, endBy will perform the correct full backup to beginning of the common prefix.

Depending on your parsers, it is possible that you can leave the punctuation matching inside your sub-parsers, and all you need might be the a try around parser1:

parseTest = do parse1 <- many (try parser1)
               parse2 <- parser2
               return AParse parse1 parse2
like image 39
MtnViewMark Avatar answered Nov 08 '22 20:11

MtnViewMark