Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Raw" string in Haskell for Regular Expression

Tags:

regex

haskell

I appear to be having trouble creating a regular expression in Haskell, what I'm trying to do is convert this string (which matches a URL in a piece of text)

\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b

Into a regular expression, the trouble is I keep getting this error in ghci

Prelude Text.RegExp> let a = fromString "\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b"

<interactive>:1:27:
    lexical error in string/character literal at character 'S'

I'm guessing it's failing because Haskell doesn't understand \S as an escape code. Are there any ways to get around this?

In Scala you can surround a string with 3 double quotes, I was wondering if you could achieve something similar in Haskell?

Any help would be appreciated.

like image 397
djhworld Avatar asked May 25 '11 09:05

djhworld


2 Answers

Every backslash in your string has to be written as a double backslash inside the double quotes. So

"\\b(((\\S+)?)(@|mailto\\:|(news|(ht|f)tp(s?))\\://)\\S+)\\b"

A more general remark: you'd be better off writing a proper parser rather than using regular expressions. Regular expressions rarely do exactly the right thing.

like image 51
augustss Avatar answered Oct 26 '22 17:10

augustss


Haskell doesn't support raw strings out of the box, however, in GHC it's very easy to implement them using quasiquotation:

r :: QuasiQuoter
r = QuasiQuoter {      
    quoteExp  = return . LitE . StringL
    ...
}

Usage:

ghci> :set -XQuasiQuotes
ghci> let s = [r|\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b|]
ghci> s
"\\b(((\\S+)?)(@|mailto\\:|(news|(ht|f)tp(s?))\\://)\\S+)\\b"

I've released a slightly more expanded and documented version of this code as the raw-strings-qq library on Hackage.

like image 42
Mikhail Glushenkov Avatar answered Oct 26 '22 17:10

Mikhail Glushenkov