Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell/Parsec: how do I use Text.Parsec.Token with Text.Parsec.Indent (from the indents package)

The indents package for Haskell's Parsec provides a way to parse indentation-style languages (like Haskell and Python). It redefines the Parser type, so how do you use the token parser functions exported by Parsec's Text.Parsec.Token module, which are of the normal Parser type?

Background

  • Parsec is a parser combinator library, whatever that means.
  • IndentParser 0.2.1 is an old package providing the two modules Text.ParserCombinators.Parsec.IndentParser and Text.ParserCombinators.Parsec.IndentParser.Token
  • indents 0.3.3 is a new package providing the single module Text.Parsec.Indent

Parsec comes with a load of modules. most of them export a bunch of useful parsers (e.g. newline from Text.Parsec.Char, which parses a newline) or parser combinators (e.g. count n p from Text.Parsec.Combinator, which runs the parser p, n times)

However, the module Text.Parsec.Token would like to export functions which are parametrized by the user with features of the language being parsed, so that, for example, the braces p function will run the parser p after parsing a '{' and before parsing a '}', ignoring things like comments, the syntax of which depends on your language.

The way that Text.Parsec.Token achieves this is that it exports a single function makeTokenParser, which you call, giving it the parameters of your specific language (like what a comment looks like) and it returns a record containing all of the functions in Text.Parsec.Token, adapted to your language as specified.

Of course, in an indentation-style language, these would need to be adapted further (perhaps? here's where I'm not sure – I'll explain in a moment) so I note that the (presumably obsolete) IndentParser package provides a module Text.ParserCombinators.Parsec.IndentParser.Token which looks to be a drop-in replacement for Text.Parsec.Token.

I should mention at some point that all the Parsec parsers are monadic functions, so they do magic things with state so that error messages can say at what line and column in the source file the error appeared

My Problem

For a couple of small reasons it appears to me that the indents package is more-or-less the current version of IndentParser, however it does not provide a module that looks like Text.ParserCombinators.Parsec.IndentParser.Token, it only provides Text.Parsec.Indent, so I am wondering how one goes about getting all the token parsers from Text.Parsec.Token (like reserved "something" which parses the reserved keyword "something", or like braces which I mentioned earlier).

It would appear to me that (the new) Text.Parsec.Indent works by some sort of monadic state magic to work out at what column bits of source code are, so that it doesn't need to modify the token parsers like whiteSpace from Text.Parsec.Token, which is probably why it doesn't provide a replacement module. But I am having a problem with types.

You see, without Text.Parsec.Indent, all my parsers are of type Parser Something where Something is the return type and Parser is a type alias defined in Text.Parsec.String as

type Parser = Parsec String ()

but with Text.Parsec.Indent, instead of importing Text.Parsec.String, I use my own definition

type Parser a = IndentParser String () a

which makes all my parsers of type IndentParser String () Something, where IndentParser is defined in Text.Parsec.Indent. but the token parsers that I'm getting from makeTokenParser in Text.Parsec.Token are of the wrong type.

If this isn't making much sense by now, it's because I'm a bit lost. The type issue is discussed a bit here.


The error I'm getting is that I've tried replacing the one definition of Parser above with the other, but then when I try to use one of the token parsers from Text.Parsec.Token, I get the compile error

Couldn't match expected type `Control.Monad.Trans.State.Lazy.State
                                Text.Parsec.Pos.SourcePos'
            with actual type `Data.Functor.Identity.Identity'
Expected type: P.GenTokenParser
                 String
                 ()
                 (Control.Monad.Trans.State.Lazy.State Text.Parsec.Pos.SourcePos)
  Actual type: P.TokenParser ()

Links

  • Parsec
  • IndentParser (old package)
  • indents, providing Text.Parsec.Indent (new package)
  • some discussion of Parser types with example code
  • another example of using Text.Parsec.Indent

Sadly, neither of the examples above use token parsers like those in Text.Parsec.Token.

like image 213
Beetle Avatar asked Mar 09 '13 20:03

Beetle


1 Answers

What are you trying to do?

It sounds like you want to have your parsers defined everywhere as being of type

Parser Something

(where Something is the return type) and to make this work by hiding and redefining the Parser type which is normally imported from Text.Parsec.String or similar. You still need to import some of Text.Parsec.String, to make Stream an instance of a monad; do this with the line:

import Text.Parsec.String ()

Your definition of Parser is correct. Alternatively and equivalently (for those following the chat in the comments) you can use

import Control.Monad.State
import Text.Parsec.Pos (SourcePos)

type Parser = ParsecT String () (State SourcePos)

and possibly do away with the import Text.Parsec.Indent (IndentParser) in the file in which this definition appears.

Error, error on the wall

Your problem is that you're looking at the wrong part of the compiler error message. You're focusing on

Couldn't match expected type `State SourcePos' with actual type `Identity'

when you should be focusing on

Expected type: P.GenTokenParser ...
  Actual type: P.TokenParser ...

It compiles!

Where you "import" parsers from Text.Parsec.Token, what you actually do, of course (as you briefly mentioned) is first to define a record your language parameters and then to pass this to the function makeTokenParser, which returns a record containing the token parsers.

You must therefore have some lines that look something like this:

import qualified Text.Parsec.Token as P

beetleDef :: P.LanguageDef st
beetleDef =
    haskellStyle {
        parameters, parameters etc.
        }

lexer :: P.TokenParser ()
lexer = P.makeTokenParser beetleDef

... but a P.LanguageDef st is just a GenLanguageDef String st Identity, and a P.TokenParser () is really a GenTokenParser String () Identity.

You must change your type declarations to the following:

import Control.Monad.State
import Text.Parsec.Pos (SourcePos)
import qualified Text.Parsec.Token as P

beetleDef :: P.GenLanguageDef String st (State SourcePos)
beetleDef =
    haskellStyle {
        parameters, parameters etc.
        }

lexer :: P.GenTokenParser String () (State SourcePos)
lexer = P.makeTokenParser beetleDef

... and that's it! This will allow your "imported" token parsers to have type ParsecT String () (State SourcePos) Something, instead of Parsec String () Something (which is an alias for ParsecT String () Identity Something) and your code should now compile.

(For maximum generality, I'm assuming that you might be defining the Parser type in a file separate from, and imported by, the file in which you define your actual parser functions. Hence the two repeated import statements.)

Thanks

Many thanks to Daniel Fischer for helping me with this.

like image 197
Beetle Avatar answered Nov 16 '22 01:11

Beetle