Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PCRE in Haskell - what, where, how?

I've been searching for some documentation or a tutorial on Haskell regular expressions for ages. There's no useful information on the HaskellWiki page. It simply gives the cryptic message:

Documentation
Coming soonish.

There is a brief blog post which I have found fairly helpful, however it only deals with Posix regular expressions, not PCRE.

I've been working with Posix regex for a few weeks and I'm coming to the conclusion that for my task I need PCRE.

My problem is that I don't know where to start with PCRE in Haskell. I've downloaded regex-pcre-builtin with cabal but I need an example of a simple matching program to help me get going.

  • Is it possible to implement multi-line matching?
  • Can I get the matches back in this format: [(MatchOffset,MatchLength)]?
  • What other formats can I get the matches back in?

Thank you very much for any help!

like image 805
Nick Brunt Avatar asked Oct 24 '11 22:10

Nick Brunt


5 Answers

There's also regex-applicative which I've written.

The idea is that you can assign some meaning to each piece of a regular expression and then compose them, just as you write parsers using Parsec.

Here's an example -- simple URL parsing.

import Text.Regex.Applicative

data Protocol = HTTP | FTP deriving Show

protocol :: RE Char Protocol
protocol = HTTP <$ string "http" <|> FTP <$ string "ftp"

type Host = String
type Location = String
data URL = URL Protocol Host Location deriving Show

host :: RE Char Host
host = many $ psym $ (/= '/')

url :: RE Char URL
url = URL <$> protocol <* string "://" <*> host <* sym '/' <*> many anySym

main = print $ "http://stackoverflow.com/questions" =~ url
like image 162
Roman Cheplyaka Avatar answered Nov 15 '22 23:11

Roman Cheplyaka


There are two main options when wanting to use PCRE-style regexes in Haskell:

  • regex-pcre uses the same interface as described in that blog post (and also in RWH, as I think an expanded version of that blog post); this can be optionally extended with pcre-less. regex-pcre-builtin seems to be a pre-release snapshot of this and probably shouldn't be used.

  • pcre-light is bindings to the PCRE library. It doesn't provide the return types you're after, just all the matchings (if any). However, the pcre-light-extras package provides a MatchResult class, for which you might be able to provide such an instance. This can be enhanced using regexqq which allows you to use quasi-quoting to ensure that your regex pattern type-checks; however, it doesn't work with GHC-7 (and unless someone takes over maintaining it, it won't).

So, assuming that you go with regex-pcre:

  • According to this answer, yes.

  • I think so, via the MatchArray type (it returns an array, which you can then get the list out from).

  • See here for all possible results from a regex.

like image 34
ivanm Avatar answered Nov 15 '22 23:11

ivanm


Well, I wrote much of the wiki page and may have written "Coming soonish". The regex-pcre package was my wrapping of PCRE using the regex-base interface, where regex-base is used as the interface for several very different regular expression engine backends. Don Stewart's pcre-light package does not have this abstraction layer and is thus much smaller.

The blog post on Text.Regex.Posix uses my regex-posix package which is also on top of regex-base. Thus the usage of regex-pcre will be very very similar to that blog post, except for the compile & execution options of PCRE being different.

For configuring regex-pcre the Text.Regex.PCRE.Wrap module has the constants you need. Use makeRegexOptsM from regex-base to specify the options.

like image 20
Chris Kuklewicz Avatar answered Nov 15 '22 23:11

Chris Kuklewicz


regexpr is another PCRE-ish lib that's cross-platform and quick to get started with.

like image 28
Simon Michael Avatar answered Nov 16 '22 00:11

Simon Michael


I find rex to be quite nice too, its ViewPatterns integration is a nice idea I think.

It can be verbose though but that's partially tied to the regex concept.

parseDate :: String -> LocalTime
parseDate [rex|(?{read -> year}\d+)-(?{read -> month}\d+)-
        (?{read -> day}\d+)\s(?{read -> hour}\d+):(?{read -> mins}\d+):
        (?{read -> sec}\d+)|] =
    LocalTime (fromGregorian year month day) (TimeOfDay hour mins sec)
parseDate v@_ = error $ "invalid date " ++ v

That said I just discovered regex-applicative mentioned in one of the other answers and it may be a better choice, could be less verbose and more idiomatic, although rex has basically zero learning curve if you know regular expressions which can be a plus.

like image 36
Emmanuel Touzery Avatar answered Nov 16 '22 00:11

Emmanuel Touzery