Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running Haskell HXT outside of IO?

Tags:

haskell

hxt

All the examples I've seen so far using the Haskell XML toolkit, HXT, uses runX to execute the parser. runX runs inside the IO monad. Is there a way of using this XML parser outside of IO? Seems to be a pure operation to me, don't understand why I'm forced to be inside IO.

like image 388
Muchin Avatar asked Oct 10 '10 18:10

Muchin


1 Answers

You can use HXT's xread along with runLA to parse an XML string outside of IO.

xread has the following type:

xread :: ArrowXml a => a String XmlTree 

This means you can compose it with any arrow of type (ArrowXml a) => a XmlTree Whatever to get an a String Whatever.

runLA is like runX, but for things of type LA:

runLA :: LA a b -> a -> [b] 

LA is an instance of ArrowXml.

To put this all together, the following version of my answer to your previous question uses HXT to parse a string containing well-formed XML without any IO involved:

{-# LANGUAGE Arrows #-} module Main where  import qualified Data.Map as M import Text.XML.HXT.Arrow  classes :: (ArrowXml a) => a XmlTree (M.Map String String) classes = listA (divs >>> pairs) >>> arr M.fromList   where     divs = getChildren >>> hasName "div"     pairs = proc div -> do       cls <- getAttrValue "class" -< div       val <- deep getText         -< div       returnA -< (cls, val)  getValues :: (ArrowXml a) => [String] -> a XmlTree (String, Maybe String) getValues cs = classes >>> arr (zip cs . lookupValues cs) >>> unlistA   where lookupValues cs m = map (flip M.lookup m) cs  xml = "<div><div class='c1'>a</div><div class='c2'>b</div>\       \<div class='c3'>123</div><div class='c4'>234</div></div>"  values :: [(String, Maybe String)] values = runLA (xread >>> getValues ["c1", "c2", "c3", "c4"]) xml  main = print values 

classes and getValues are similar to the previous version, with a few minor changes to suit the expected input and output. The main difference is that here we use xread and runLA instead of readString and runX.

It would be nice to be able to read something like a lazy ByteString in a similar manner, but as far as I know this isn't currently possible with HXT.


A couple of other things: you can parse strings in this way without IO, but it's probably better to use runX whenever you can: it gives you more control over the configuration of the parser, error messages, etc.

Also: I tried to make the code in the example straightforward and easy to extend, but the combinators in Control.Arrow and Control.Arrow.ArrowList make it possible to work with arrows much more concisely if you like. The following is an equivalent definition of classes, for example:

classes = (getChildren >>> hasName "div" >>> pairs) >. M.fromList   where pairs = getAttrValue "class" &&& deep getText 
like image 142
Travis Brown Avatar answered Sep 21 '22 13:09

Travis Brown