Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing text with optional data at the end

Please note, subsequently to posting this question I managed to derive a solution myself. See the end of this question for my final answer.


I'm working on a little parser at the moment for org-mode documents, and in these documents headings can have a title, and may optionally consist of a list of tags at the of the heading:

* Heading          :foo:bar:baz:

I'm having difficulty writing a parser for this, however. The following is what I'm working with for now:

import Control.Applicative
import Text.ParserCombinators.Parsec

data Node = Node String [String]
            deriving (Show)

myTest = parse node "" "Some text here :tags:here:"

node = Node <$> (many1 anyChar) <*> tags

tags = (char ':') >> (sepEndBy1 (many1 alphaNum) (char ':'))
   <?> "Tag list"

While my simple tags parser works, it doesn't work in the context of node because all of the characters are used up parsing the title of the heading (many1 anyChar). Furthermore, I can't change this parser to use noneOf ":" because : is valid in the title. In fact, it's only special if it's in a taglist, at the very end of the line.

Any ideas how I can parse this optional data?

As an aside, this is my first real Haskell project, so if Parsec is not even the right tool for the job - feel free to point that out and suggest other options!


Ok, I got a complete solution now, but it needs refactoring. The following works:

import Control.Applicative hiding (many, optional, (<|>))
import Control.Monad
import Data.Char (isSpace)
import Text.ParserCombinators.Parsec

 data Node = Node { level :: Int, keyword :: Maybe String, heading :: String, tags :: Maybe [String] }
   deriving (Show)

parseNode = Node <$> level <*> (optionMaybe keyword) <*> name <*> (optionMaybe tags)
    where level = length <$> many1 (char '*') <* space
          keyword = (try (many1 upper <* space))
          name = noneOf "\n" `manyTill` (eof <|> (lookAhead (try (tags *> eof))))
          tags = char ':' *> many1 alphaNum `sepEndBy1` char ':'

myTest = parse parseNode "org-mode" "** Some : text here :tags: JUST KIDDING     :tags:here:"
myTest2 = parse parseNode "org-mode" "* TODO Just a node"
like image 295
ocharles Avatar asked Feb 14 '11 19:02

ocharles


1 Answers

import Control.Applicative hiding (many, optional, (<|>))
import Control.Monad
import Text.ParserCombinators.Parsec

instance Applicative (GenParser s a) where
  pure = return
  (<*>) = ap

data Node = Node { name :: String, tags :: Maybe [String] }
  deriving (Show)

parseNode = Node <$> name <*> tags
  where tags = optionMaybe $ optional (string " :") *> many (noneOf ":\n") `sepEndBy` (char ':')
        name = noneOf "\n" `manyTill` try (string " :" <|> string "\n")

myTest = parse parseNode "" "Some:text here :tags:here:"
myTest2 = parse parseNode "" "Sometext here :tags:here:"

Results:

*Main> myTest
Right (Node {name = "Some:text here", tags = Just ["tags","here",""]})
*Main> myTest2
Right (Node {name = "Sometext here", tags = Just ["tags","here",""]})
like image 167
Bill Avatar answered Nov 15 '22 07:11

Bill