If I have an XML document like this:
<root>
<elem name="Greeting">
Hello
</elem>
<elem name="Name">
Name
</elem>
</root>
and some Haskell type/data definitions like this:
type Name = String
type Value = String
data LocalizedString = LS Name Value
and I wanted to write a Haskell function with the following signature:
getLocalizedStrings :: String -> [LocalizedString]
where the first parameter was the XML text, and the returned value was:
[LS "Greeting" "Hello", LS "Name" "Name"]
how would I do this?
If HaXml is the best tool, how would I use HaXml to achieve the above goal?
Thank!
I've never actually bothered to figure out how to extract bits out of XML documents using HaXML; HXT has met all my needs.
{-# LANGUAGE Arrows #-}
import Data.Maybe
import Text.XML.HXT.Arrow
type Name = String
type Value = String
data LocalizedString = LS Name Value
getLocalizedStrings :: String -> Maybe [LocalizedString]
getLocalizedStrings = (.) listToMaybe . runLA $ xread >>> getRoot
atTag :: ArrowXml a => String -> a XmlTree XmlTree
atTag tag = deep $ isElem >>> hasName tag
getRoot :: ArrowXml a => a XmlTree [LocalizedString]
getRoot = atTag "root" >>> listA getElem
getElem :: ArrowXml a => a XmlTree LocalizedString
getElem = atTag "elem" >>> proc x -> do
name <- getAttrValue "name" -< x
value <- getChildren >>> getText -< x
returnA -< LS name value
You'd probably like a little more error-checking (i.e. don't just lazily use atTag
like me; actually verify that <root>
is root, <elem>
is direct descendent, etc.) but this works just fine on your example.
Now, if you need an introduction to Arrows, unfortunately I don't know of any good one. I myself learned it the "thrown into the ocean to learn how to swim" way.
Something that may be helpful to keep in mind is that the proc
/-<
syntax is simply sugar for the basic arrow operations (arr
, >>>
, etc.), just like do
/<-
is simply sugar for the basic monad operations (return
, >>=
, etc.). The following are equivalent:
getAttrValue "name" &&& (getChildren >>> getText) >>^ uncurry LS
proc x -> do
name <- getAttrValue "name" -< x
value <- getChildren >>> getText -< x
returnA -< LS name value
Use one of the XML packages.
The most popular are, in order,
FWIW, HXT seems like overkill where a simple TagSoup will do :)
Here's my second attempt (after receiving some good input from others) with TagSoup:
module Xml where
import Data.Char
import Text.HTML.TagSoup
type SName = String
type SValue = String
data LocalizedString = LS SName SValue
deriving Show
getLocalizedStrings :: String -> [LocalizedString]
getLocalizedStrings = create . filterTags . parseTags
where
filterTags :: [Tag] -> [Tag]
filterTags = filter (\x -> isTagOpenName "elem" x || isTagText x)
create :: [Tag] -> [LocalizedString]
create (TagOpen "elem" [("name", name)] : TagText text : rest) =
LS name (trimWhiteSpace text) : create rest
create (_:rest) = create rest
create [] = []
trimWhiteSpace :: String -> String
trimWhiteSpace = dropWhile isSpace . reverse . dropWhile isSpace . reverse
main = do
xml <- readFile "xml.xml" -- xml.xml contains the xml in the original question.
putStrLn . show . getLocalizedStrings $ xml
The first attempt showcased a naive (and faulty) method for trimming whitespace off of a string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With