Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsec how to find "matches" within a string

Tags:

haskell

parsec

How can I use parsec to parse all matched input in a string and discard the rest?

Example: I have a simple number parser, and I can find all the numbers if I know what separates them:

num :: Parser Int
num = read <$> many digit

parse (num `sepBy` space) "" "111 4 22"

But what if I don't know what is between the numbers?

"I will live to be 111 years <b>old</b> if I work out 4 days a week starting at 22."

many anyChar doesn't work as a separator, because it consumes everything.

So how can I get things that match an arbitrary parser surrounded by things I want to ignore?


EDIT: Note that in the real problem, my parser is more complicated:

optionTag :: Parser Fragment
optionTag = do
    string "<option"
    manyTill anyChar (string "value=")
    n <- many1 digit
    manyTill anyChar (char '>')
    chapterPrefix
    text <- many1 (noneOf "<>")
    return $ Option (read n) text
  where
    chapterPrefix = many digit >> char '.' >> many space
like image 205
Sean Clark Hess Avatar asked Apr 09 '15 21:04

Sean Clark Hess


4 Answers

The replace-megaparsec package allows you to split up a string into sections which match your pattern and sections which don't match by using the sepCap parser combinator.

import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char

let num :: Parsec Void String Int
    num = read <$> many digitChar
>>> parseTest (sepCap num) "I will live to be 111 years <b>old</b> if I work out 4 days a week starting at 22."
[Left "I will live to be "
,Right 111
,Left " years <b>old</b> if I work out "
,Right 4
,Left " days a week starting at "
,Right 22
,Left "."
]
like image 89
James Brock Avatar answered Sep 26 '22 16:09

James Brock


For an arbitrary parser myParser, it's quite easy:

solution = many (let one = myParser <|> (anyChar >> one) in one)

It might be clearer to write it this way:

solution = many loop
    where 
        loop = myParser <|> (anyChar >> loop)

Essentially, this defines a recursive parser (called loop) that will continue searching for the first thing that can be parsed by myParser. many will simply search exhaustively until failure, ie: EOF.

like image 28
AJF Avatar answered Sep 22 '22 16:09

AJF


You can use

 many ( noneOf "0123456789")

i'm not sure about "noneOf" and "digit" types but you can give e try also to

many $ noneOf digit
like image 30
Gabriel Ciubotaru Avatar answered Sep 22 '22 16:09

Gabriel Ciubotaru


To find the item in the string, the item is either at the start of the string, or consume one character and look for the item in the now-shorter string. If the item isn't right at the start of the string, you'll need to un-consume the characters used while looking for it, so you'll need a try block.

hasItem = prefixItem <* (many anyChar)
preafixItem = (try item) <|> (anyChar >> prefixItem)
item = <parser for your item here>

This code looks for just one occurrence of item in the string.

(AJFarmar almost has it.)

like image 24
Neil Smith Avatar answered Sep 23 '22 16:09

Neil Smith