How can I use parsec to parse all matched input in a string and discard the rest?
Example: I have a simple number parser, and I can find all the numbers if I know what separates them:
num :: Parser Int
num = read <$> many digit
parse (num `sepBy` space) "" "111 4 22"
But what if I don't know what is between the numbers?
"I will live to be 111 years <b>old</b> if I work out 4 days a week starting at 22."
many anyChar
doesn't work as a separator, because it consumes everything.
So how can I get things that match an arbitrary parser surrounded by things I want to ignore?
EDIT: Note that in the real problem, my parser is more complicated:
optionTag :: Parser Fragment
optionTag = do
string "<option"
manyTill anyChar (string "value=")
n <- many1 digit
manyTill anyChar (char '>')
chapterPrefix
text <- many1 (noneOf "<>")
return $ Option (read n) text
where
chapterPrefix = many digit >> char '.' >> many space
The replace-megaparsec package allows you to split up a string into sections which match your pattern and sections which don't match by using the sepCap
parser combinator.
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
let num :: Parsec Void String Int
num = read <$> many digitChar
>>> parseTest (sepCap num) "I will live to be 111 years <b>old</b> if I work out 4 days a week starting at 22."
[Left "I will live to be "
,Right 111
,Left " years <b>old</b> if I work out "
,Right 4
,Left " days a week starting at "
,Right 22
,Left "."
]
For an arbitrary parser myParser
, it's quite easy:
solution = many (let one = myParser <|> (anyChar >> one) in one)
It might be clearer to write it this way:
solution = many loop
where
loop = myParser <|> (anyChar >> loop)
Essentially, this defines a recursive parser (called loop
) that will continue searching for the first thing that can be parsed by myParser
. many
will simply search exhaustively until failure, ie: EOF.
You can use
many ( noneOf "0123456789")
i'm not sure about "noneOf" and "digit" types but you can give e try also to
many $ noneOf digit
To find the item in the string, the item is either at the start of the string, or consume one character and look for the item in the now-shorter string. If the item isn't right at the start of the string, you'll need to un-consume the characters used while looking for it, so you'll need a try
block.
hasItem = prefixItem <* (many anyChar)
preafixItem = (try item) <|> (anyChar >> prefixItem)
item = <parser for your item here>
This code looks for just one occurrence of item
in the string.
(AJFarmar almost has it.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With