I got the meaning of |
(pipe special character) in regex, Python.
It matches either 1st or 2nd.
ex : a|b
Matches either a or b.
My question:
What if I want to match is a
with case sensitive and b
with case insensitive in above example?
ex:
s = "Welcome to PuNe, Maharashtra"
result1 = re.search("punnee|MaHaRaShTrA",s)
result2 = re.search("pune|maharashtra",s)
result3 = re.search("PuNe|MaHaRaShTrA",s)
result4 = re.search("P|MaHaRaShTrA",s)
I want to search Pune in the way I have written in above statement s
i.e PuNe
. But I have to search Maharashtra by ignoring case. How can I search 1 word with case sensitive and other with case insensitive? So that, result1
, result2
, result3
, result4
will give not null
value.
I tried:
result1 = re.search("pune|MaHaRaShTrA",s1, re.IGNORECASE)
But this ignores the cases for both the words.
How can I restrict 1 word as case sensitive and other as case insensitive?
re. IGNORECASE : This flag allows for case-insensitive matching of the Regular Expression with the given string i.e. expressions like [A-Z] will match lowercase letters, too. Generally, It's passed as an optional argument to re. compile() .
Search patterns are made up of a sequence of characters and can be specified using regex rules. However, to work with regular Python expressions, you first need to import the re module. Case insensitive means that the text should be considered equal in lowercase and uppercase.
By default, the comparison of an input string with any literal characters in a regular expression pattern is case-sensitive, white space in a regular expression pattern is interpreted as literal white-space characters, and capturing groups in a regular expression are named implicitly as well as explicitly.
In Python 3.6 and later, you may use the inline modifier groups:
>>> s = "Welcome to PuNe, Maharashtra"
>>> print(re.findall(r"PuNe|(?i:MaHaRaShTrA)",s))
['PuNe', 'Maharashtra']
See the relevant Python re
documentation:
(?aiLmsux-imsx:...)
(Zero or more letters from the set'a'
,'i'
,'L'
,'m'
,'s'
,'u'
,'x'
, optionally followed by'-'
followed by one or more letters from the'i'
,'m'
,'s'
,'x'
.) The letters set or remove the corresponding flags:re.A
(ASCII-only matching),re.I
(ignore case),re.L
(locale dependent),re.M
(multi-line),re.S
(dot matches all),re.U
(Unicode matching), andre.X
(verbose), for the part of the expression. (The flags are described in Module Contents.)The letters
'a'
,'L'
and'u'
are mutually exclusive when used as inline flags, so they can’t be combined or follow'-'
. Instead, when one of them appears in an inline group, it overrides the matching mode in the enclosing group. In Unicode patterns(?a:...)
switches to ASCII-only matching, and(?u:...)
switches to Unicode matching (default). In byte pattern(?L:...)
switches to locale depending matching, and(?a:...)
switches to ASCII-only matching (default). This override is only in effect for the narrow inline group, and the original matching mode is restored outside of the group.
New in version 3.6.
Changed in version 3.7: The letters 'a', 'L' and 'u' also can be used in a group.
Unfortunately, Python re
versions before 3.6 did not support these groups, nor did they support alternating on and off inline modifiers.
If you can use PyPi regex module, you may use a (?i:...)
construct:
import regex
s = "Welcome to PuNe, Maharashtra"
print(regex.findall(r"PuNe|(?i:MaHaRaShTrA)",s))
See the online Python demo.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With