Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set ignorecase flag for part of regular expression in Python?

Tags:

python

regex

Is it possible to implement in Python something like this simple one:

#!/usr/bin/perl
my $a = 'Use HELLO1 code';
if($a =~ /(?i:use)\s+([A-Z0-9]+)\s+(?i:code)/){
    print "$1\n";
}

Letters of token in the middle of string are always capital. Letters of the rest of words can have any case (USE, use, Use, CODE, code, Code and so on)

like image 212
Dmitry Nedbaylo Avatar asked Sep 21 '09 15:09

Dmitry Nedbaylo


3 Answers

Since python 3.6 you can use flag inside groups :

(?imsx-imsx:...)

(Zero or more letters from the set 'i', 'm', 's', 'x', optionally followed by '-' followed by one or more letters from the same set.) The letters set or removes the corresponding flags: re.I (ignore case), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the part of the expression.

Thus (?i:use) is now a correct syntaxe. From a python3.6 terminal:

>>> import re
>>> regex = re.compile('(?i:use)\s+([A-Z0-9]+)\s+(?i:code)')
>>> regex.match('Use HELLO1 code')
<_sre.SRE_Match object; span=(0, 15), match='Use HELLO1 code'>
>>> regex.match('use HELLO1 Code')
<_sre.SRE_Match object; span=(0, 15), match='use HELLO1 Code'>
like image 94
Thomas Perrot Avatar answered Oct 20 '22 03:10

Thomas Perrot


As far as I could find, the python regular expression engine does not support partial ignore-case. Here is a solution using a case-insensitive regular expression, which then tests if the token is uppercase afterward.

#! /usr/bin/env python

import re

token_re = re.compile(r'use\s+([a-z0-9]+)\s+code', re.IGNORECASE)
def find_token(s):
    m = token_re.search(s)
    if m is not None:
        token = m.group(1)
        if token.isupper():
            return token

if __name__ == '__main__':
    for s in ['Use HELLO1 code',
              'USE hello1 CODE',
              'this does not match',
             ]:
        print s, '->',
        print find_token(s)

Here is the program's output:

Use HELLO1 code -> HELLO1
USE hello1 CODE -> None
this does not match -> None
like image 43
Christian Oudard Avatar answered Oct 20 '22 02:10

Christian Oudard


According to the docs, this is not possible. The (?x) syntax only allows you to modify a flag for the whole expression. Therefore, you must split this into three regexp and apply them one after the other or do the "ignore case" manually: /[uU][sS][eE]...

like image 44
Aaron Digulla Avatar answered Oct 20 '22 01:10

Aaron Digulla