Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell HXT for extracting a list of values

Tags:

xml

haskell

hxt

I'm trying to figure my way through HXT with XPath and arrows at the same time and I'm completely stuck on how to think through this problem. I've got the following HTML:

<div>
<div class="c1">a</div> 
<div class="c2">b</div> 
<div class="c3">123</div> 
<div class="c4">234</div> 
</div>

which I've extracted into an HXT XmlTree. What I'd like to do is define a function (I think?):

getValues :: [String] -> IOSArrow Xmltree [(String, String)]

Which, if used as getValues ["c1", "c2", "c3", "c4"], will get me:

[("c1", "a"), ("c2", "b"), ("c3", "123"), ("c4", "234")]

Help please?

like image 264
Muchin Avatar asked Oct 09 '10 18:10

Muchin


1 Answers

Here's one approach (my types are a bit more general and I'm not using XPath):

{-# LANGUAGE Arrows #-}
module Main where

import qualified Data.Map as M
import Text.XML.HXT.Arrow

classes :: (ArrowXml a) => a XmlTree (M.Map String String)
classes = listA (divs >>> divs >>> pairs) >>> arr M.fromList
  where
    divs = getChildren >>> hasName "div"
    pairs = proc div -> do
      cls <- getAttrValue "class" -< div
      val <- deep getText         -< div
      returnA -< (cls, val)

getValues :: (ArrowXml a) => [String] -> a XmlTree [(String, Maybe String)]
getValues cs = classes >>> arr (zip cs . lookupValues cs)
  where lookupValues cs m = map (flip M.lookup m) cs

main = do
  let xml = "<div><div class='c1'>a</div><div class='c2'>b</div>\
            \<div class='c3'>123</div><div class='c4'>234</div></div>"

  print =<< runX (readString [] xml >>> getValues ["c1", "c2", "c3", "c4"])

I would probably run an arrow to get the map and then do the lookups, but this way works as well.


To answer your question about listA: divs >>> divs >>> pairs is a list arrow with type a XmlTree (String, String)—i.e., it's a non-deterministic computation that takes an XML tree and returns string pairs.

arr M.fromList has type a [(String, String)] (M.Map String String). This means we can't just compose it with divs >>> divs >>> pairs, since the types don't match up.

listA solves this problem: it collapses divs >>> divs >>> pairs into a deterministic version with type a XmlTree [(String, String)], which is exactly what we need.

like image 151
Travis Brown Avatar answered Nov 15 '22 07:11

Travis Brown