Parsing tags with TagSoup in Haskell

Question

I've been trying to learn how to extract data from HTML files in Haskell, and have hit a wall. I'm not really experience with Haskell at all, and my previous knowledge is from Python (and BeatifulSoup for HTML parsing).

I'm using TagSoup to look at my HTML (seemed to be recommended) and sort of have a basic idea of how it works. Here's the basic segment of my code in question (self-contained, and outputs information for testing):

import System.IO
import Network.HTTP
import Text.HTML.TagSoup
import Data.List

main :: IO ()
main = do
    http <- simpleHTTP (getRequest "http://www.cbssports.com/nba/scoreboard/20130310") >>= getResponseBody
    let tags = dropWhile (~/= TagOpen "div" []) (parseTags http)
    done tags where
        done xs = case xs of
            [] -> putStrLn $ "
"
            _ -> do
                putStrLn $ show $ head xs
                done (tail xs)

However, I'm not trying to get to any "div" tag. I want to drop everything prior to a tag in a format like this:

TagOpen "div" [("id","scores-1997830"),("class","scoreBox spanCol2")]
TagOpen "div" [("id","scores-1997831"),("class","scoreBox spanCol2 lastCol")]

I've tried writing it out:

let tags = dropWhile (~/= TagOpen "div" [("id", "scores-[0-9]+"), ("class", "scoreBox( spanCol[0-9]?)+( lastCol)?")]) (parseTags http)

But then it tries to find the literal [0-9]+. I haven't figured out a workaround with the Text.Regex.Posix module yet, and escaping the characters doesn't work. What's the solution here?

Koterpillar · Accepted Answer

~== does not do regular expressions, you will have to write a matcher yourself, something along the lines of

import Data.Maybe
import Text.Regex

goodTag :: TagOpen -> Bool
goodTag tag = tag ~== TagOpen "div" []
    && fromAttrib "id" tag `matches` "scores-[0-9]+"

-- Just a wrapper around Text.Regex.matchRegex
matches :: String -> String -> Bool
matches string regex = isJust $ mkRegex regex `matchRegex` string

Parsing tags with TagSoup in Haskell

Tags:

html

regex

haskell

haskell-tagsoup

simonsays

1 Answers

Koterpillar

Recent Activity

Donate For Us

Parsing tags with TagSoup in Haskell

Tags:

html

regex

haskell

haskell-tagsoup

simonsays

1 Answers

Koterpillar

Related questions

Recent Activity

Donate For Us