Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex matching uppercase characters with lowercase search

I'm using notepad++ and I'm finding that when I use regex to search for strings where I specifically want to find lowercase letters ("[a-z]") it will sometimes return uppercase letters.

I originally was searching using the string:

^[A-Z][a-z].+?$

With the purpose of deleting any line in my file that began with an uppercase character, followed by a lowercase, followed by anything until the end of the line. However, this returned lines like, "CLONE" and "DISEASE" which were only capital letters. Out of curiosity, I tried:

^[a-z].+?$

And it still returned those lines in all-caps. Finally, I tried:

^[\u0061-\u007A].+?$

And it still returned lines of all-caps text. Is there something outside of my brackets that's causing this to happen?

like image 517
Phil Dinius Avatar asked Aug 19 '14 12:08

Phil Dinius


People also ask

How do you match a capital letter in regex?

Using character sets For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter. In a character set a ^ character negates the following characters.

What does ?= Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.

What does \b represent in regex?

Inside a character range, \b represents the backspace character, for compatibility with Python's string literals. Matches the empty string, but only when it is not at the beginning or end of a word.


1 Answers

As many other text editors, Notepad++ provides a global option to Match case. Even if your expression does not contain internal modifier (?i) the results can be unexpected depending on whether Match case is set ON or OFF.

So, your ALLCAPS lines are valid match for ^[A-Z][a-z].+?$ because the letters are matched in a case insensitive way when Match case is OFF.

Check Match case to enable case sensitivity for regex search:

enter image description here

OTHER WAYS TO OVERRIDE CASE SENSITIVITY

There are inline flags you may use with some regex flavors to hardcode case sensitivity for all or part of the pattern:

  • (?-i)[A-Z][a-z]* will only match an uppercase letter followed with lowercase ones as (?-i) turns the case sensitivity ON
  • (?i)[A-Z][a-z]* will match 1 or more uppercase or lowercase letters
  • (?-i)[a-z](?i)[a-f](?-i)[a-z] will match a lowercase letter, then a lower- or an uppercase letter from a to f and A to F, and then again will match a lowercase letter
  • S(?i:[a-z])S - S or s will be matched with S (depends on the environment settings like Match case), then any upper- or lowercase letter and then S/s.
like image 140
Wiktor Stribiżew Avatar answered Sep 21 '22 03:09

Wiktor Stribiżew