What's the best way to use regular expressions with options (flags) in Haskell
I use
Text.Regex.PCRE
The documentation lists a few interesting options like compCaseless, compUTF8, ... But I don't know how to use them with (=~)
i) case-insensitive mode ON (?-i) case-insensitive mode OFF. Modern regex flavors allow you to apply modifiers to only part of the regular expression. If you insert the modifier (? im) in the middle of the regex then the modifier only applies to the part of the regex to the right of the modifier.
The regular expression pattern is matched in the input string from left to right. Comparisons are case-sensitive. The ^ and $ language elements match the beginning and end of the input string. The end of the input string can be a trailing newline \n character.
1. Using CASE_INSENSITIVE flag: The compile method of the Pattern class takes the CASE_INSENSITIVE flag along with the pattern to make the Expression case-insensitive.
Perl lets you make part of your regular expression case-insensitive by using the (? i:) pattern modifier. Modern regex flavors allow you to apply modifiers to only part of the regular expression.
All the Text.Regex.*
modules make heavy use of typeclasses, which are there for extensibility and "overloading"-like behavior, but make usage less obvious from just seeing types.
Now, you've probably been started off from the basic =~
matcher.
(=~) ::
( RegexMaker Regex CompOption ExecOption source
, RegexContext Regex source1 target )
=> source1 -> source -> target
(=~~) ::
( RegexMaker Regex CompOption ExecOption source
, RegexContext Regex source1 target, Monad m )
=> source1 -> source -> m target
To use =~
, there must exist an instance of RegexMaker ...
for the LHS, and RegexContext ...
for the RHS and result.
class RegexOptions regex compOpt execOpt | ...
| regex -> compOpt execOpt
, compOpt -> regex execOpt
, execOpt -> regex compOpt
class RegexOptions regex compOpt execOpt
=> RegexMaker regex compOpt execOpt source
| regex -> compOpt execOpt
, compOpt -> regex execOpt
, execOpt -> regex compOpt
where
makeRegex :: source -> regex
makeRegexOpts :: compOpt -> execOpt -> source -> regex
A valid instance of all these classes (for example, regex=Regex
, compOpt=CompOption
, execOpt=ExecOption
, and source=String
) means it's possible to compile a regex
with compOpt,execOpt
options from some form source
. (Also, given some regex
type, there is exactly one compOpt,execOpt
set that goes along with it. Lots of different source
types are okay, though.)
class Extract source
class Extract source
=> RegexLike regex source
class RegexLike regex source
=> RegexContext regex source target
where
match :: regex -> source -> target
matchM :: Monad m => regex -> source -> m target
A valid instance of all these classes (for example, regex=Regex
, source=String
, target=Bool
) means it's possible to match a source
and a regex
to yield a target
. (Other valid target
s given these specific regex
and source
are Int
, MatchResult String
, MatchArray
, etc.)
Put these together and it's pretty obvious that =~
and =~~
are simply convenience functions
source1 =~ source
= match (makeRegex source) source1
source1 =~~ source
= matchM (makeRegex source) source1
and also that =~
and =~~
leave no room to pass various options to makeRegexOpts
.
You could make your own
(=~+) ::
( RegexMaker regex compOpt execOpt source
, RegexContext regex source1 target )
=> source1 -> (source, compOpt, execOpt) -> target
source1 =~+ (source, compOpt, execOpt)
= match (makeRegexOpts compOpt execOpt source) source1
(=~~+) ::
( RegexMaker regex compOpt execOpt source
, RegexContext regex source1 target, Monad m )
=> source1 -> (source, compOpt, execOpt) -> m target
source1 =~~+ (source, compOpt, execOpt)
= matchM (makeRegexOpts compOpt execOpt source) source1
which could be used like
"string" =~+ ("regex", CompCaseless + compUTF8, execBlank) :: Bool
or overwrite =~
and =~~
with methods which can accept options
import Text.Regex.PCRE hiding ((=~), (=~~))
class RegexSourceLike regex source
where
makeRegexWith source :: source -> regex
instance RegexMaker regex compOpt execOpt source
=> RegexSourceLike regex source
where
makeRegexWith = makeRegex
instance RegexMaker regex compOpt execOpt source
=> RegexSourceLike regex (source, compOpt, execOpt)
where
makeRegexWith (source, compOpt, execOpt)
= makeRegexOpts compOpt execOpt source
source1 =~ source
= match (makeRegexWith source) source1
source1 =~~ source
= matchM (makeRegexWith source) source1
or you could just use match
, makeRegexOpts
, etc. directly where needed.
I don't know anything about Haskell, but if you're using a regex library based on PCRE, then you can use mode modifiers inside the regular expression. To match "caseless" in a case insensitive fashion, you can use this regex in PCRE:
(?i)caseless
The mode modifier (?i) overrides any case sensitivity or case insensitivity option that was set outside the regular expression. It also works with operators that don't allow you to set any options.
Similarly, (?s) turns on "single line mode" which makes the dot match line breaks, (?m) turns on "multi line mode" which makes ^ and $ match at line breaks, and (?x) turns on free-spacing mode (unescaped spaces and line breaks outside character classes are insignificant). You can combine the letters. (?ismx) turns on everything. A hyphen turns off options. (?-i) makes the regex case sensitive. (?x-i) starts a free-spacing case sensitive regex.
I believe cannot use (=~) if you wish to use compOpt
other than defaultCompOpt
.
Something like this work:
match (makeRegexOpts compCaseless defaultExecOpt "(Foo)" :: Regex) "foo" :: Bool
The follow two articles should assist you:
Real World Haskell, Chapter 8. Efficient file processing, regular expressions, and file name matching
A Haskell regular expression tutorial
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With