Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Choosing a Haskell parser

There are many open sourced parser implementations available to us in Haskell. Parsec seems to be the standard for text parsing and attoparsec seems to be a popular choice for binary parsing but I don't know much beyond that. Is there a particular decision tree that you follow for choosing a parser implementation? Have you learned anything interesting about the strengths or weaknesses of the libraries?

like image 992
Keith Avatar asked Jun 19 '10 20:06

Keith


People also ask

Is Haskell good for parsing?

Haskell is an excellent language for all your parsing needs. The functional nature of the language makes it easy to compose different building blocks together without worrying about nasty side effects and unforeseen consequences.

What is a parser in Haskell?

In a functional language such as Haskell, parsers can naturally be viewed as functions. type Parser = String Tree. A parser is a function that takes a string and returns some form of tree.

What is generator parser?

A parser generator takes a grammar as input and automatically generates source code that can parse streams of characters using the grammar. The generated code is a parser, which takes a sequence of characters and tries to match the sequence against the grammar.


2 Answers

You have several good options.

For lightweight parsing of String types:

  • parsec
  • polyparse

For packed bytestring parsing, e.g. of HTTP headers.

  • attoparsec

For actual binary data most people use either:

  • binary -- for lazy binary parsing
  • cereal -- for strict binary parsing

The main question to ask yourself is what is the underlying string type?

  • String?
  • bytestring (strict)?
  • bytestring (lazy)?
  • unicode text

That decision largely determines which parser toolset you'll use.

The second question to ask is: do I already have a grammar for the data type? If so, I can just use happy

  • The Happy parser generator

And obviously for custom data types there are a variety of good existing parsers:

  • XML
    • haxml
    • xml-light
    • hxt
    • hexpat
  • CSV
    • bytestring-csv
    • csv
  • JSON
    • json
  • rss/atom
    • feed
like image 137
Don Stewart Avatar answered Oct 18 '22 02:10

Don Stewart


Just to add to Don's post: Personally, I quite like Text.ParserCombinators.ReadP (part of base) for no-nonsense quick and easy stuff. Particularly when Parsec seems like overkill.

There is a bytestringreadp library for the bytestring version, but it doesn't cover Char8 bytestrings, and I suspect attoparsec would be a better choice at this point.

like image 39
Sam Martin Avatar answered Oct 18 '22 01:10

Sam Martin