Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement a Read instance with the actual parsing already complete, in Haskell?

Tags:

haskell

In this answer, one finds the statement that it would not be very hard to implement a Read instance for the datatype Tree, with the actual parsing already complete.

However, I am having a hard time to understand how such a function as read works: AFAIK, I should implement a readsPrec function instead of read and this readsPrec should perform the reading for strings consisting of only one character. Is this correct?
If that is the case, then how should one implement a Read instance of Tree when the parsing is done via ParsecT? Can we break the parsing word by word, or is there any need to do so?

I didn't think of read as a difficult feature in Haskell, but now I find it quite perplexed and confusing, and I am lost in searching Hoogle for all such unfamiliar things as readP_to_S, readS, etc.

Any help or reference will be greatly appreciated.

like image 996
awllower Avatar asked Aug 07 '16 15:08

awllower


1 Answers

I've been wondering about this for some time, and your question prompted me to look into it.

Summary: Easiest way to do it manually:

instance Read Tree where
    readsPrec _ str = [(parsecRead str,"")]

But deriving is the safer option; the above doesn't work with [Tree] and other data types. My understanding is that Show and Read are not intended to be implemented manually; they should be derived and work on syntactically correct Haskell expressions.


It looks like the reason that Read isn't as simple as

class Read a where
    read :: String -> a

is that there is system of parser combinators, similar to but distinct from Parsec, that's designed to be modular, recursive, and so on. But since we are already using a different parser combinator library, Parsec, I'd think it's best to mess with the other system as little as possible.

The Prelude documentation says that a minimal complete implementation of Read is readsPrec or readPrec. The latter is described as "Proposed replacement for readsPrec using new-style parsers (GHC only)". This smells like trouble to me, so let's go with implementing readsPrec.

The type is

readsPrec :: Read a => Int -> ReadS a
type ReadS a = String -> [(a,String)]

and the documentation for ReadS reads "A parser for a type a, represented as a function that takes a String and returns a list of possible parses as (a,String) pairs". To me, it's not entirely obvious what a "parse" is, but a peek at the source code for read in Text.Read is revealing:

read :: Read a => String -> a
read s = either errorWithoutStackTrace id (readEither s)

readEither :: Read a => String -> Either String a
readEither s =
    -- minPrec is defined as 0 in Text.ParserCombinators.ReadPrec
    case [ x | (x,"") <- readPrec_to_S read' minPrec s ] of
      [x] -> Right x
      []  -> Left "Prelude.read: no parse"
      _   -> Left "Prelude.read: ambiguous parse"
  where
   read' = -- read' :: P.ReadPrec a
     do x <- readPrec
        lift P.skipSpaces -- P is Text.ParserCombinators.ReadP
        return x

I tried to expand the definitions of readPrec_to_S etc., but I felt it wasn't worth it. I think the definition makes clear that we should return [(x,"")] as a successful parse.

The integer argument to readsPrec seems to be the "precedence context". My guess is that it is safe to ignore it if we just want to parse one tree at a time, but that ignoring it will cause problems later if we try to parse instances of [Tree] for example. I will ignore it because I don't think it's worth the trouble.

In short, if we have parsecRead :: String -> Tree as defined in the post you refer to (the author called it read')

instance Read Tree where
    readsPrec _ str = [(parsecRead str,"")]

If we check how this works in a program (using the Show instance that the original asker provided):

main = do
    print (read "ABC(DE)F" == example)
    print ([read "ABC(DE)F", read "ABC(DE)F"] :: [Tree])
    print (read "[ABC(DE)F,ABC(DE)F]" :: [Tree])

we get

True
[ABC(DE)F,ABC(DE)F]
Test.hs: Prelude.read: no parse

The complexity and lack of documentation here actually makes me think that deriving (Read) is actually the only safe option, unless you are willing to dive into the details of precedence levels. I think I've read somewhere that Show and Read are actually mainly intended to be derived, and that the strings are intended to be syntactically correct Haskell expressions (please correct me if I'm wrong). For more general parsing, libraries like Parsec are probably the better option.

If you have the energy to look into the source code yourself, the relevant modules appear to be

  • Text.Read
  • GHC.Read
  • Text.ParserCombinators.ReadP
  • Text.ParserCombinators.ReadPrec
like image 61
Elias Riedel Gårding Avatar answered Nov 10 '22 01:11

Elias Riedel Gårding