I've seen lots of examples of making an entire regular expression case-insensitive. What I'm wondering about is having just part of the expression be case-insensitive. For example, let's say I have a string like this: <pre class="prettyprint"><code>fooFOOfOoFoOBARBARbarbarbAr </code></pre> What if I want to match all occurrences of "foo" regardless of case but I only want to match the upper-case "BAR"s? The ideal solution would be something that works across regex flavors but I'm interested in hearing language-specific ones as well (Thanks Espo) Edit The link Espo provided was very helpful. There's a good example in there about turning modifiers on and off within the expression. For my contrived example, I can do something like this: <pre class="prettyprint"><code>(?i)foo*(?-i)|BAR </code></pre> which makes the match case-insensitive for just the foo portion of the match. That seemed to work in most regex implementations except Javascript, Python, and a few others (as Espo mentioned). The big ones that I was wondering about (Perl, PHP, .NET) all support inline mode changes.

It is true one can rely on inline modifiers as described in Turning Modes On and Off for Only Part of The Regular Expression: <blockquote> The regex <code>(?i)te(?-i)st</code> should match test and <code>TEst</code>, but not <code>teST</code> or <code>TEST</code>. </blockquote> However, a bit more supported feature is an <code>(?i:...)</code> inline modifier group (see Modifier Spans). The syntax is <code>(?i:</code>, then the pattern that you want to make cas-insensitive, and then a <code>)</code>. <pre class="prettyprint"><code>(?i:foo)|BAR </code></pre> The reverse: If your pattern is compiled with a case insensitive option and you need to make a part of a regex case sensitive, you add <code>-</code> after <code>?</code>: <code>(?-i:...)</code>. Example uses in various languages (wrapping the matches with angle brackets): <ul> <li> php - <code>preg_replace("~(?i:foo)|BAR~", '<$0>', "fooFOOfOoFoOBARBARbarbarbAr")</code> (demo)</li> <li> python - <code>re.sub(r'(?i:foo)|BAR', r'<\g<0>>', 'fooFOOfOoFoOBARBARbarbarbAr')</code> (demo) (note Python <code>re</code> supports inline modifier groups since Python 3.6)</li> <li> c# / vb.net / .net - <code>Regex.Replace("fooFOOfOoFoOBARBARbarbarbAr", "(?i:foo)|BAR", "<$&>")</code> (demo)</li> <li> java - <code>"fooFOOfOoFoOBARBARbarbarbAr".replaceAll("(?i:foo)|BAR", "<$0>")</code> (demo)</li> <li> perl - <code>$s =~ s/(?i:foo)|BAR/<$&>/g</code> (demo)</li> <li> ruby - <code>"fooFOOfOoFoOBARBARbarbarbAr".gsub(/(?i:foo)|BAR/, '<\0>')</code> (demo)</li> <li> r - <code>gsub("((?i:foo)|BAR)", "<\\1>", "fooFOOfOoFoOBARBARbarbarbAr", perl=TRUE)</code> (demo)</li> <li> swift - <code>"fooFOOfOoFoOBARBARbarbarbAr".replacingOccurrences(of: "(?i:foo)|BAR", with: "<$0>", options: [.regularExpression])</code> </li> <li> go - (uses RE2) - <code>regexp.MustCompile(`(?i:foo)|BAR`).ReplaceAllString( "fooFOOfOoFoOBARBARbarbarbAr", `<${0}>`)</code> (demo)</li> </ul> Not supported in javascript, bash, sed, c++ <code>std::regex</code>, lua, tcl. In these case, you can put both letter variants into a character class (not a group, see Why is a character class faster than alternation?). Examples: <ul> <li> sed posix-ere - <code>sed -E 's/[Ff][Oo][Oo]|BAR/<&>/g' file > outfile</code> (demo)</li> <li> grep posix-ere - <code>grep -Eo '[Ff][Oo][Oo]|BAR' file</code> (or if you are using GNU grep, you can still use the PCRE regex, <code>grep -Po '(?i:foo)|BAR' file</code> (demo))</li> </ul>

Can you make just part of a regex case-insensitive?

Tags:

regex

I've seen lots of examples of making an entire regular expression case-insensitive. What I'm wondering about is having just part of the expression be case-insensitive.

For example, let's say I have a string like this:

fooFOOfOoFoOBARBARbarbarbAr

What if I want to match all occurrences of "foo" regardless of case but I only want to match the upper-case "BAR"s?

The ideal solution would be something that works across regex flavors but I'm interested in hearing language-specific ones as well (Thanks Espo)

Edit

The link Espo provided was very helpful. There's a good example in there about turning modifiers on and off within the expression.

For my contrived example, I can do something like this:

(?i)foo*(?-i)|BAR

which makes the match case-insensitive for just the foo portion of the match.

That seemed to work in most regex implementations except Javascript, Python, and a few others (as Espo mentioned).

The big ones that I was wondering about (Perl, PHP, .NET) all support inline mode changes.

482

asked Sep 04 '08 12:09

Mark Biek

4 Answers

Perl lets you make part of your regular expression case-insensitive by using the (?i:) pattern modifier.

Modern regex flavors allow you to apply modifiers to only part of the regular expression. If you insert the modifier (?ism) in the middle of the regex, the modifier only applies to the part of the regex to the right of the modifier. You can turn off modes by preceding them with a minus sign. All modes after the minus sign will be turned off. E.g. (?i-sm) turns on case insensitivity, and turns off both single-line mode and multi-line mode.

Not all regex flavors support this. JavaScript and Python apply all mode modifiers to the entire regular expression. They don't support the (?-ismx) syntax, since turning off an option is pointless when mode modifiers apply to the whole regular expressions. All options are off by default.

You can quickly test how the regex flavor you're using handles mode modifiers. The regex (?i)te(?-i)st should match test and TEst, but not teST or TEST.

Source

101

answered Oct 13 '22 15:10

Espo

It is true one can rely on inline modifiers as described in Turning Modes On and Off for Only Part of The Regular Expression:

The regex (?i)te(?-i)st should match test and TEst, but not teST or TEST.

However, a bit more supported feature is an (?i:...) inline modifier group (see Modifier Spans). The syntax is (?i:, then the pattern that you want to make cas-insensitive, and then a ).

(?i:foo)|BAR

The reverse: If your pattern is compiled with a case insensitive option and you need to make a part of a regex case sensitive, you add - after ?: (?-i:...).

Example uses in various languages (wrapping the matches with angle brackets):

php - preg_replace("~(?i:foo)|BAR~", '<$0>', "fooFOOfOoFoOBARBARbarbarbAr") (demo)
python - re.sub(r'(?i:foo)|BAR', r'<\g<0>>', 'fooFOOfOoFoOBARBARbarbarbAr') (demo) (note Python re supports inline modifier groups since Python 3.6)
c# / vb.net / .net - Regex.Replace("fooFOOfOoFoOBARBARbarbarbAr", "(?i:foo)|BAR", "<$&>") (demo)
java - "fooFOOfOoFoOBARBARbarbarbAr".replaceAll("(?i:foo)|BAR", "<$0>") (demo)
perl - $s =~ s/(?i:foo)|BAR/<$&>/g (demo)
ruby - "fooFOOfOoFoOBARBARbarbarbAr".gsub(/(?i:foo)|BAR/, '<\0>') (demo)
r - gsub("((?i:foo)|BAR)", "<\\1>", "fooFOOfOoFoOBARBARbarbarbAr", perl=TRUE) (demo)
swift - "fooFOOfOoFoOBARBARbarbarbAr".replacingOccurrences(of: "(?i:foo)|BAR", with: "<$0>", options: [.regularExpression])
go - (uses RE2) - regexp.MustCompile(`(?i:foo)|BAR`).ReplaceAllString( "fooFOOfOoFoOBARBARbarbarbAr", `<${0}>`) (demo)

Not supported in javascript, bash, sed, c++ std::regex, lua, tcl.

In these case, you can put both letter variants into a character class (not a group, see Why is a character class faster than alternation?). Examples:

sed posix-ere - sed -E 's/[Ff][Oo][Oo]|BAR/<&>/g' file > outfile (demo)
grep posix-ere - grep -Eo '[Ff][Oo][Oo]|BAR' file (or if you are using GNU grep, you can still use the PCRE regex, grep -Po '(?i:foo)|BAR' file (demo))

answered Oct 13 '22 16:10

Wiktor Stribiżew

What language are you using? A standard way to do this would be something like /([Ff][Oo]{2}|BAR)/ with case sensitivity on, but in Java, for example, there is a case sensitivity modifier (?i) which makes all characters to the right of it case insensitive and (?-i) which forces sensitivity. An example of that Java regex modifier can be found here.

answered Oct 13 '22 16:10

akdom

Unfortunately syntax for case-insensitive matching is not common. In .NET you can use RegexOptions.IgnoreCase flag or ?i modifier

answered Oct 13 '22 15:10

aku

Related questions
                            
                                Regex expressions in Java, \\s vs. \\s+
                            
                                How do I get the name of captured groups in a C# Regex?
                            
                                Does Ruby regular expression have a not match operator like "!~" in Perl?
                            
                                Regex to remove all (non numeric OR period)
                            
                                Recursively change file extensions in Bash
                            
                                Javascript Regular Expression Remove Spaces
                            
                                What is the regex for "Any positive integer, excluding 0"
                            
                                Using RegEx in SQL Server
                            
                                How can we match a^n b^n?
                            
                                Regex: ?: notation (Question mark and colon notation) [duplicate]
                            
                                How can I use a regex to replace upper case with lower case in Intellij IDEA?
                            
                                How to remove numbers from a string?
                            
                                Regular expression which matches a pattern, or is an empty string
                            
                                Does Flask support regular expressions in its URL routing?
                            
                                How do I include negative decimal numbers in this regular expression?
                            
                                How to remove special characters from a string?
                            
                                How to negate the whole regex?
                            
                                Regex to match a digit two or four times
                            
                                How does one escape backslashes and forward slashes in VIM find/search?
                            
                                Python, remove all non-alphabet chars from string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With