Using Parsec 3.1
, it is possible to parse several types of inputs:
[Char]
with Text.Parsec.String
Data.ByteString
with Text.Parsec.ByteString
Data.ByteString.Lazy
with Text.Parsec.ByteString.Lazy
I don't see anything for the Data.Text
module. I want to parse Unicode content without suffering from the String
inefficiencies. So I've created the following module based on the Text.Parsec.ByteString
module:
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-} {-# OPTIONS_GHC -fno-warn-orphans #-} module Text.Parsec.Text ( Parser, GenParser ) where import Text.Parsec.Prim import qualified Data.Text as T instance (Monad m) => Stream T.Text m Char where uncons = return . T.uncons type Parser = Parsec T.Text () type GenParser t st = Parsec T.Text st
Additional comments:
I had to add {-# LANGUAGE NoMonomorphismRestriction #-}
pragma in my parse modules to make it work.
Parsing Text
is one thing, building an AST with Text
is another thing. I will also need to pack
my String
before return:
module TestText where import Data.Text as T import Text.Parsec import Text.Parsec.Prim import Text.Parsec.Text input = T.pack "xxxxxxxxxxxxxxyyyyxxxxxxxxxp" parser = do x1 <- many1 (char 'x') y <- many1 (char 'y') x2 <- many1 (char 'x') return (T.pack x1, T.pack y, T.pack x2) test = runParser parser () "test" input
Since Parsec 3.1.2 support of Data.Text is built-in! See http://hackage.haskell.org/package/parsec-3.1.2
If you are stuck with older version, the code snippets in other answers are helpful, too.
That looks like exactly what you need to do.
It should be compatible with the rest of Parsec, include the Parsec.Char parsers.
If you're using Cabal to build your program, please put an upper bound of parsec-3.1 in your package description, in case the maintainer decides to include that instance in a future version of Parsec.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With