Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

case-insensitive regular expressions

What's the best way to use regular expressions with options (flags) in Haskell

I use

Text.Regex.PCRE

The documentation lists a few interesting options like compCaseless, compUTF8, ... But I don't know how to use them with (=~)

like image 558
Gaetan Dubar Avatar asked Jun 17 '09 15:06

Gaetan Dubar


People also ask

What is case-insensitive regular expression?

i) case-insensitive mode ON (?-i) case-insensitive mode OFF. Modern regex flavors allow you to apply modifiers to only part of the regular expression. If you insert the modifier (? im) in the middle of the regex then the modifier only applies to the part of the regex to the right of the modifier.

Are regular expressions case sensitive?

The regular expression pattern is matched in the input string from left to right. Comparisons are case-sensitive. The ^ and $ language elements match the beginning and end of the input string. The end of the input string can be a trailing newline \n character.

How do I make an insensitive pattern case?

1. Using CASE_INSENSITIVE flag: The compile method of the Pattern class takes the CASE_INSENSITIVE flag along with the pattern to make the Expression case-insensitive.

How do you make a section insensitive in a regex case?

Perl lets you make part of your regular expression case-insensitive by using the (? i:) pattern modifier. Modern regex flavors allow you to apply modifiers to only part of the regular expression.


3 Answers

All the Text.Regex.* modules make heavy use of typeclasses, which are there for extensibility and "overloading"-like behavior, but make usage less obvious from just seeing types.

Now, you've probably been started off from the basic =~ matcher.

(=~) ::
  ( RegexMaker Regex CompOption ExecOption source
  , RegexContext Regex source1 target )
  => source1 -> source -> target
(=~~) ::
  ( RegexMaker Regex CompOption ExecOption source
  , RegexContext Regex source1 target, Monad m )
  => source1 -> source -> m target

To use =~, there must exist an instance of RegexMaker ... for the LHS, and RegexContext ... for the RHS and result.

class RegexOptions regex compOpt execOpt | ...
      | regex -> compOpt execOpt
      , compOpt -> regex execOpt
      , execOpt -> regex compOpt
class RegexOptions regex compOpt execOpt
      => RegexMaker regex compOpt execOpt source
         | regex -> compOpt execOpt
         , compOpt -> regex execOpt
         , execOpt -> regex compOpt
  where
    makeRegex :: source -> regex
    makeRegexOpts :: compOpt -> execOpt -> source -> regex

A valid instance of all these classes (for example, regex=Regex, compOpt=CompOption, execOpt=ExecOption, and source=String) means it's possible to compile a regex with compOpt,execOpt options from some form source. (Also, given some regex type, there is exactly one compOpt,execOpt set that goes along with it. Lots of different source types are okay, though.)

class Extract source
class Extract source
      => RegexLike regex source
class RegexLike regex source
      => RegexContext regex source target
  where
    match :: regex -> source -> target
    matchM :: Monad m => regex -> source -> m target

A valid instance of all these classes (for example, regex=Regex, source=String, target=Bool) means it's possible to match a source and a regex to yield a target. (Other valid targets given these specific regex and source are Int, MatchResult String, MatchArray, etc.)

Put these together and it's pretty obvious that =~ and =~~ are simply convenience functions

source1 =~ source
  = match (makeRegex source) source1
source1 =~~ source
  = matchM (makeRegex source) source1

and also that =~ and =~~ leave no room to pass various options to makeRegexOpts.

You could make your own

(=~+) ::
   ( RegexMaker regex compOpt execOpt source
   , RegexContext regex source1 target )
   => source1 -> (source, compOpt, execOpt) -> target
source1 =~+ (source, compOpt, execOpt)
  = match (makeRegexOpts compOpt execOpt source) source1
(=~~+) ::
   ( RegexMaker regex compOpt execOpt source
   , RegexContext regex source1 target, Monad m )
   => source1 -> (source, compOpt, execOpt) -> m target
source1 =~~+ (source, compOpt, execOpt)
  = matchM (makeRegexOpts compOpt execOpt source) source1

which could be used like

"string" =~+ ("regex", CompCaseless + compUTF8, execBlank) :: Bool

or overwrite =~ and =~~ with methods which can accept options

import Text.Regex.PCRE hiding ((=~), (=~~))

class RegexSourceLike regex source
  where
    makeRegexWith source :: source -> regex
instance RegexMaker regex compOpt execOpt source
         => RegexSourceLike regex source
  where
    makeRegexWith = makeRegex
instance RegexMaker regex compOpt execOpt source
         => RegexSourceLike regex (source, compOpt, execOpt)
  where
    makeRegexWith (source, compOpt, execOpt)
      = makeRegexOpts compOpt execOpt source

source1 =~ source
  = match (makeRegexWith source) source1
source1 =~~ source
  = matchM (makeRegexWith source) source1

or you could just use match, makeRegexOpts, etc. directly where needed.

like image 140
ephemient Avatar answered Oct 07 '22 04:10

ephemient


I don't know anything about Haskell, but if you're using a regex library based on PCRE, then you can use mode modifiers inside the regular expression. To match "caseless" in a case insensitive fashion, you can use this regex in PCRE:

(?i)caseless

The mode modifier (?i) overrides any case sensitivity or case insensitivity option that was set outside the regular expression. It also works with operators that don't allow you to set any options.

Similarly, (?s) turns on "single line mode" which makes the dot match line breaks, (?m) turns on "multi line mode" which makes ^ and $ match at line breaks, and (?x) turns on free-spacing mode (unescaped spaces and line breaks outside character classes are insignificant). You can combine the letters. (?ismx) turns on everything. A hyphen turns off options. (?-i) makes the regex case sensitive. (?x-i) starts a free-spacing case sensitive regex.

like image 24
Jan Goyvaerts Avatar answered Oct 07 '22 05:10

Jan Goyvaerts


I believe cannot use (=~) if you wish to use compOpt other than defaultCompOpt.

Something like this work:

match (makeRegexOpts compCaseless defaultExecOpt  "(Foo)" :: Regex) "foo" :: Bool

The follow two articles should assist you:

Real World Haskell, Chapter 8. Efficient file processing, regular expressions, and file name matching

A Haskell regular expression tutorial

like image 24
davetapley Avatar answered Oct 07 '22 05:10

davetapley