Multiline Matching in Haskell Posix

Question

I can't seem to find decent documentation on haskell's POSIX implementation. Specifically the module Text.Regex.Posix.

Can anyone point me in the right direction of using multiline matching on a string?

A snippet for the curious:

> extractToken body = body =~ "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>" :: String

I'm trying to extract the source of wikipedia pages, however this method clearly falls over when more than one line is involved.

ephemient · Accepted Answer

You may need to import Text.Regex.Base.RegexLike for access to makeRegexOpts and friends.

extractToken body = match regex body where
    regex = makeRegexOpts (defaultCompOpt - compNewline) defaultExecOpt
              "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

Well, since Text.Regex.Posix's defaultCompOpt = compExtended + compNewline, that works out equivalently as

extractToken body = match regex body where
    regex = makeRegexOpts compExtended defaultExecOpt
              "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

To pull out just the first group, use one of the other instances of RegexLike. One possibility is

extractToken body = head groups where
    (preMatch, inMatch, postMatch, groups) =
        match regex body :: (String, String, String, [String])
    regex = makeRegexOpts compExtended defaultExecOpt
              "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

Don Stewart · Answer

You may need to use the PCRE backend instead if you want to do anything more flexible, or with better performance, than Posix regexes.

pcre-light and regex-pcre are both fine.

Multiline Matching in Haskell Posix

Tags:

regex

posix

functional-programming

haskell

cabal

Ian Elliott

2 Answers

ephemient

Don Stewart

Recent Activity

Donate For Us

Multiline Matching in Haskell Posix

Tags:

regex

posix

functional-programming

haskell

cabal

Ian Elliott

2 Answers

ephemient

Don Stewart

Related questions

Recent Activity

Donate For Us