Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript Regex ignore case for specific capture group

In PCRE this would be a valid expression

/^\!(foo|bar) ((?i)ab|cd|ef|gh)$/

But in JavaScript Regex this is not valid. Unfortunately I'm not aware what (?i) is called so I'm having some trouble googleing it. How would I translate this given example to be valid in JavaScript?


What I actually want to do:

find all lines which start with !foo or !bar followed by a space and end with ab, cd, ef or gh. The latter should be case insensitive.

!foo CD
!foo cD
!foo cd

would all be valid. While

!FOO cd !Foo cd

would be invalid

like image 550
boop Avatar asked Jan 04 '16 02:01

boop


2 Answers

You can download the ECMAScript (JavaScript) documentation from here:

https://www.ecma-international.org/publications/standards/Ecma-262.htm

The RegExp is clearly defined there and it is not based on advanced Perl rules. So the (?...) syntax is not supported.

One way to do what you want is to use the [...] for each character that need to be upper/lower case:

(?i)ab   becomes   [aA][bB]

It's a lot more typing, but I do not know of a better solution.

If the entire regex could be in any case, then you could use the flag:

/ab/i

But in your example, that means "foo" would also be accepted as "Foo" or "fOO".


Newer versions of JavaScript do support the (<flag>?...) syntax.

  • DotAll is true if the RegExp object's [[OriginalFlags]] internal slot contains "s" and otherwise is false.
  • IgnoreCase is true if the RegExp object's [[OriginalFlags]] internal slot contains "i" and otherwise is false.
  • Multiline is true if the RegExp object's [[OriginalFlags]] internal slot contains "m" and otherwise is false.
  • Unicode is true if the RegExp object's [[OriginalFlags]] internal slot contains "u" and otherwise is false.

So Giuseppe Ricupero's answer applies for new browsers, Node, React, etc.

like image 190
Alexis Wilke Avatar answered Oct 13 '22 07:10

Alexis Wilke


The (?i) is the case-insensitive flag: starting from the point inside your regex where it is placed it makes all the character class containing letter e.g. [a-z] to matches also [A-Z] (and viceversa). This works also for a single letter a (matches a and A) or sequence ab (matches ab,Ab,aB,AB).

So you can put it at the beginning of your regex /(?i)regex/ (making it equivalent to the js /regex/i) or you can use it together with its opposite (?-i) to make only some section of the regex case-insensitive:

/^(?i)[a-z]{2}(?-i)[a-z]{2}/ 

The regex above matches 2 uppercase or lowercase chars plus 2 strictly lowercase chars.

Matches ->   ROck, rOck, Rock
Not Matches -> ROCK, roCk, rOcK

What about your PCRE regex?

/^\!(foo|bar) ((?i)ab|cd|ef|gh)$/

If you don't mind to match also a string starting with !Foo,!FOo,!foO,!fOO,!BAR,!bar,... you can put the flag outside, as this:

/^!(foo|bar) (ab|cd|ef|gh)$/i # you can also remove the escape from \! -> !

If you want instead the exact equivalent of the original PCRE regex (/^!(foo|bar) ((?i)ab|cd|ef|gh)$/) the equivalent js regex is the less readable:

/^!(foo|bar) ([Aa][Bb]|[Cc][Dd]|[Ee][Ff]|[Gg][Hh])$/
like image 35
Giuseppe Ricupero Avatar answered Oct 13 '22 08:10

Giuseppe Ricupero