I'm learning HXT at the moment by using it to parse a GPX file. An example is here. I've got the following so far:
import Data.Time
import Text.XML.HXT.Core
data Gpx = Gpx [Trk] deriving (Show)
data Trk = Trk [TrkSeg] deriving (Show)
data TrkSeg = TrkSeg [TrkPt] deriving (Show)
data TrkPt = TrkPt Double Double deriving (Show)
parseGpx =
getChildren >>> isElem >>> hasName "gpx" >>>
getChildren >>> isElem >>> hasName "trk" >>>
parseGpxTrk >>> arr Gpx
parseGpxTrk = undefined
parseGpxTrkSegs = undefined
You can see that it's incomplete, but it should still type-check. Unfortunately, I'm already running into an error:
Couldn't match type ‘Trk’ with ‘[Trk]’
Expected type: Trk -> Gpx
Actual type: [Trk] -> Gpx
In the first argument of ‘arr’, namely ‘Gpx’
In the second argument of ‘(>>>)’, namely ‘arr Gpx’
What this error says is that I'm trying to pass each matched item from the parseGpxTrk arrow through the arr Gpx constructor, but what I actually want is to pass the entire list of matches through the arr Gpx constructor.
So, how do I get HXT (or arrows in general?) to pass the matches as a list through my arr Gpx constructor instead of passing each entry in the list through the arr Gpx constructor?
Here's a solution that seems pretty good to me
{-# LANGUAGE Arrows #-}
import Data.Maybe
import Text.Read
import Text.XML.HXT.Core
import Control.Applicative
data Gpx = Gpx [Trk] deriving (Show)
data Trk = Trk [TrkSeg] deriving (Show)
data TrkSeg = TrkSeg [TrkPt] deriving (Show)
data TrkPt = TrkPt Double Double deriving (Show)
The trickiest one is probably the parseTrkPt because in order to do it right you have to handle parsing Strings to Double, which can fail. I've made the decision to have it return a Maybe TrkPt instead, and then handle that further down the line:
elemsNamed :: ArrowXml cat => String -> cat XmlTree XmlTree
elemsNamed name = isElem >>> hasName name
parseTrkPt :: ArrowXml cat => cat XmlTree (Maybe TrkPt)
parseTrkPt = elemsNamed "trkpt" >>>
proc trkpt -> do
lat <- getAttrValue "lat" -< trkpt
lon <- getAttrValue "lon" -< trkpt
returnA -< TrkPt <$> readMaybe lat <*> readMaybe lon
I've also used the proc syntax here because I think it comes out a lot cleaner. The TrkPt <$> readMaybe lat <*> readMaybe lon has the type Maybe TrkPt and will return Nothing if either of the readMaybes returns Nothing. We can now aggregate all the successful results:
parseTrkSeg :: (ArrowXml cat, ArrowList cat) => cat XmlTree TrkSeg
parseTrkSeg =
elemsNamed "trkseg" >>>
(getChildren >>> parseTrkPt >>. catMaybes) >. TrkSeg
The parentheses are important here, it took me a while to figure that part out. Depending on where you place the parens you'll get different results, such as [TrkSeg [TrkPt a b], TrkSeg [TrkPt c d]] instead of [TrkSeg [TrkPt a b, TrkPt c d]]. The next to parsers are both straightforward following a similar pattern:
parseTrk :: ArrowXml cat => cat XmlTree Trk
parseTrk =
elemsNamed "trk" >>>
(getChildren >>> parseTrkSeg) >. Trk
parseGpx :: ArrowXml cat => cat XmlTree Gpx
parseGpx =
elemsNamed "gpx" >>>
(getChildren >>> parseTrk) >. Gpx
Then you can run it quite simply, although you'll have to still drill passed the root element:
main :: IO ()
main = do
gpxs <- runX $ readDocument [withRemoveWS yes] "ana.gpx"
>>> getChildren
>>> parseGpx
-- Pretty print the document
forM_ gpxs $ \(Gpx trks) -> do
putStrLn "GPX:"
forM_ trks $ \(Trk segs) -> do
putStrLn "\tTRK:"
forM_ segs $ \(TrkSeg pts) -> do
putStrLn "\t\tSEG:"
forM_ pts $ \pt -> do
putStr "\t\t\t"
print pt
The trick is to use the methods in the ArrowList typeclass, notably >. which has the type a b c -> ([c] -> d) -> a b d. It aggregates the elements from the ArrowList, passes it to a function that converts it to a new type, then outputs a new ArrowList on that new type d.
If you want you can even abstract this a bit for the last 3 parsers:
nestedListParser :: ArrowXml cat => String -> cat XmlTree a -> ([a] -> b) -> cat XmlTree b
nestedListParser name subparser constructor
= elemsNamed name
>>> (getChildren >>> subparser)
>. constructor
parseTrkSeg :: (ArrowXml cat, ArrowList cat) => cat XmlTree TrkSeg
parseTrkSeg = nestedListParser "trkseg" (parseTrkPt >>. catMaybes) TrkSeg
parseTrk :: ArrowXml cat => cat XmlTree Trk
parseTrk = nestedListParser "trk" parseTrkSeg Trk
parseGpx :: ArrowXml cat => cat XmlTree Gpx
parseGpx = nestedListParser "gpx" parseTrk Gpx
This might come in handy if you want to complete the rest of the grammar of a GPX file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With