Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Attoparsec: skipping up to (but not including) a multi-char delimiter

I have a string that can contain pretty much any character. Inside the string there is the delimiter {{{.

For example: afskjdfakjsdfkjas{{{fasdf.

Using attoparsec, what is the idiomatic way of writing a Parser () that skips all characters before {{{, but without consuming the {{{?

like image 934
danidiaz Avatar asked Oct 19 '22 11:10

danidiaz


1 Answers

Use attoparsec's lookAhead (which applies a parser without consuming any input) and manyTill to write a parser that consumes everything up to (but excluding) a {{{ delimiter. You're then free to apply that parser and throw its result away.

{-# LANGUAGE OverloadedStrings #-}

import Control.Applicative ( (<|>) )
import Data.Text ( Text )
import qualified Data.Text as T
import Data.Attoparsec.Text
import Data.Attoparsec.Combinator ( lookAhead, manyTill )

myParser :: Parser Text
myParser = T.concat <$> manyTill (nonOpBraceSpan <|> opBraceSpan)
                                 (lookAhead $ string "{{{")
                    <?> "{{{"
  where
    opBraceSpan    = takeWhile1 (== '{')
    nonOpBraceSpan = takeWhile1 (/= '{')

In GHCi:

λ> :set -XOverloadedStrings 
λ> parseTest myParser "{foo{{bar{{{baz"
Done "{{{baz" "{foo{{bar"
like image 62
jub0bs Avatar answered Oct 23 '22 01:10

jub0bs