Could someone please post a small example of IndentParser usage? I am looking to parse YAML-like input like the following:
fruits:
apples: yummy
watermelons: not so yummy
vegetables:
carrots: are orange
celery raw: good for the jaw
I know there is a YAML package. I would like to learn the usage of IndentParser.
I've sketched out a parser below, for your problem you probably only need the block parser from IndentParser. Note I haven't tried to run it so it might have elementary errors.
The biggest problem for your parser is not really indenting, but that you only have strings and colon as tokens. You might find the code below takes quite a bit of debugging as it will have to be very sensitive about not consuming too much input, though I have tried to be careful about left-factoring. Because you only have two tokens there isn't much benefit you can get from Parsec's Token module.
Note that there is a strange truth to parsing that simple looking formats are often not simple to parse. For learning, writing a parser for simple expressions will teach you much more that an more-or-less arbitrary text format (that might only cause you frustration).
data DefinitionTree = Nested String [DefinitionTree]
| Def String String
deriving (Show)
-- Note - this might need some testing.
--
-- This is a tricky one, the parser has to parse trailing
-- spaces and tabs but not a new line.
--
category :: IndentCharParser st String
category = do
{ a <- body
; rest
; return a
}
where
body = manyTill1 (letter <|> space) (char ':')
rest = many (oneOf [' ', '\t'])
-- Because the DefinitionTree data type has two quite
-- different constructors, both sharing the same prefix
-- 'category' this combinator is a bit more complicated
-- than usual, and has to use an Either type to descriminate
-- between the options.
--
definition :: IndentCharParser st DefinitionTree
definition = do
{ a <- category
; b <- (textL <|> definitionsR)
; case b of
Left ss -> return (Def a ss)
Right ds -> return (Nested a ds)
}
-- Note this should parse a string *provided* it is on
-- the same line as the category.
--
-- However you might find this assumption needs verifying...
--
textL :: IndentCharParser st (Either DefinitionTrees a)
textL = do
{ ss <- manyTill1 anyChar "\n"
; return (Left ss)
}
-- Finally this one uses an indent parser.
--
definitionsR :: IndentCharParser st (Either a [DefinitionTree])
definitionsR = block body
where
body = do { a <- many1 definition; return (Right a) }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With