Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

restrict 1 word as case sensitive and other as case insensitive in python regex | (pipe)

Tags:

python

regex

I got the meaning of | (pipe special character) in regex, Python. It matches either 1st or 2nd.

ex : a|b Matches either a or b.

My question: What if I want to match is a with case sensitive and b with case insensitive in above example?

ex:

s = "Welcome to PuNe, Maharashtra"

result1 = re.search("punnee|MaHaRaShTrA",s)
result2 = re.search("pune|maharashtra",s)
result3 = re.search("PuNe|MaHaRaShTrA",s)
result4 = re.search("P|MaHaRaShTrA",s)

I want to search Pune in the way I have written in above statement s i.e PuNe. But I have to search Maharashtra by ignoring case. How can I search 1 word with case sensitive and other with case insensitive? So that, result1, result2, result3, result4 will give not null value.

I tried:

result1 = re.search("pune|MaHaRaShTrA",s1, re.IGNORECASE)

But this ignores the cases for both the words.

How can I restrict 1 word as case sensitive and other as case insensitive?

like image 290
Harsha Biyani Avatar asked Jul 04 '17 09:07

Harsha Biyani


People also ask

How do you ignore a case in regex Python?

re. IGNORECASE : This flag allows for case-insensitive matching of the Regular Expression with the given string i.e. expressions like [A-Z] will match lowercase letters, too. Generally, It's passed as an optional argument to re. compile() .

Is regex case sensitive in Python?

Search patterns are made up of a sequence of characters and can be specified using regex rules. However, to work with regular Python expressions, you first need to import the re module. Case insensitive means that the text should be considered equal in lowercase and uppercase.

Are regex expressions case sensitive?

By default, the comparison of an input string with any literal characters in a regular expression pattern is case-sensitive, white space in a regular expression pattern is interpreted as literal white-space characters, and capturing groups in a regular expression are named implicitly as well as explicitly.


1 Answers

In Python 3.6 and later, you may use the inline modifier groups:

>>> s = "Welcome to PuNe, Maharashtra"
>>> print(re.findall(r"PuNe|(?i:MaHaRaShTrA)",s))
['PuNe', 'Maharashtra']

See the relevant Python re documentation:

(?aiLmsux-imsx:...)
   (Zero or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x', optionally followed by '-' followed by one or more letters from the 'i', 'm', 's', 'x'.) The letters set or remove the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode matching), and re.X (verbose), for the part of the expression. (The flags are described in Module Contents.)

The letters 'a', 'L' and 'u' are mutually exclusive when used as inline flags, so they can’t be combined or follow '-'. Instead, when one of them appears in an inline group, it overrides the matching mode in the enclosing group. In Unicode patterns (?a:...) switches to ASCII-only matching, and (?u:...) switches to Unicode matching (default). In byte pattern (?L:...) switches to locale depending matching, and (?a:...) switches to ASCII-only matching (default). This override is only in effect for the narrow inline group, and the original matching mode is restored outside of the group.

New in version 3.6.

Changed in version 3.7: The letters 'a', 'L' and 'u' also can be used in a group.

Unfortunately, Python re versions before 3.6 did not support these groups, nor did they support alternating on and off inline modifiers.

If you can use PyPi regex module, you may use a (?i:...) construct:

import regex
s = "Welcome to PuNe, Maharashtra"
print(regex.findall(r"PuNe|(?i:MaHaRaShTrA)",s))

See the online Python demo.

like image 72
Wiktor Stribiżew Avatar answered Oct 01 '22 12:10

Wiktor Stribiżew