python regular expression: match either one of several regular expressions

Tags:

I have a string and three patterns that I want to match and I use the python re package. Specifically, if one of the pattern is found, output "Dislikes", otherwise, output "Likes". Brief info about the three patterns:

pattern 1: check if all character in string is uppercase letter

pattern 2: check if consecutive character are the same, for example, AA, BB...

pattern3 : check if pattern XYXY exist, X and Y can be same and letters in this pattern do not need to be next to each other.

When I write the pattern separately, the program runs as expected. But when I combine the 3 patterns using alternation |, the result is wrong. I have check the stackoverflow post, for example, here and here. Solution provided there do not work for me.

Here is the original code that works fine:

import sys
import re

if __name__ == "__main__":
    pattern1 = re.compile(r"[^A-Z]+")
    pattern2 = re.compile(r"([A-Z])\1")
    pattern3 = re.compile(r"([A-Z])[A-Z]*([A-Z])[A-Z]*\1[A-Z]*\2")

    word = sys.stdin.readline()
    word = word.rstrip('\n')
    if pattern1.search(word) or pattern2.search(word) or pattern3.search(word):
        print("Dislikes")
    else:
        print("Likes")

If I combine the 3 pattern to one using the following code, something is wrong:

import sys
import re

if __name__ == "__main__":

    pattern = r"([A-Z])[A-Z]*([A-Z])[A-Z]*\1[A-Z]*\2|([A-Z])\1|[^A-Z]+"

    word = sys.stdin.readline()

    word = word.rstrip('\n')
    if re.search(word, pattern):
        print("Dislikes")
    else:
       print("Likes")

If we call the 3 patterns p1, p2, and p3, I also tried the following combination:

pattern = r"(p1|p2|p3)"
pattern = r"(p1)|(p2)|(p3)"

But they also do not work as expected. What is the correct to combine them?

Test cases:

"Likes": ABC, ABCD, A, ABCBA
"Dislikes": ABBC (pattern2), THETXH(pattern3), ABACADA(pattern3), AbCD(pattern1)

591

asked Sep 04 '17 14:09

jdhao

1 Answers

Here is a single pattern that joins yours:

([^A-Z]+|([A-Z])\2|([A-Z])[A-Z]*([A-Z])[A-Z]*\3[A-Z]*\4)

So, why does it work?

It consists of a simple (p1|p2|p3) pattern, where p1, p2 and p3 are those you defined before:

[^A-Z]+
([A-Z])\1
([A-Z])[A-Z]*([A-Z])[A-Z]*\1[A-Z]*\2

It can be decomposed as:

(
  [^A-Z]+
 |([A-Z])\2
 |([A-Z])[A-Z]*([A-Z])[A-Z]*\3[A-Z]*\
)

The problem you were encoutering is the numbering of the groups.

First off, when you combine p2 and p3, both refer to \1, but the latter represents different things across the two patterns. Therefore, p3 should become ...\2...\3, since there is an additional group before.

Furthermore, the group indices refered to by \number are indexed in the order in which they are opened. As a consequence, the very first parenthesis, corresponding to the opening of the outer (...|...|...), is counted as the first group, and \1 will refer to it. Of course, this is not what you want. But in addition, this gives you an error, because then, \1 refers to a group that has not been closed yet, and thus not defined.

Therefore, the indices should be shifted by one, becoming \2, \3 and \4.

Such A|B regexes are usually nested into parentheses, but the outer ones could actually be dropped, and the indices shifted back by one:

[^A-Z]+|([A-Z])\1|([A-Z])[A-Z]*([A-Z])[A-Z]*\2[A-Z]*\3

Here is a small demonstration of this pattern:

import sys
import re

if __name__ == "__main__":
    pattern1 = re.compile(r"[^A-Z]+")
    pattern2 = re.compile(r"([A-Z])\1")
    pattern3 = re.compile(r"([A-Z])[A-Z]*([A-Z])[A-Z]*\1[A-Z]*\2")    
    pattern = re.compile(r"([^A-Z]+|([A-Z])\2|([A-Z])[A-Z]*([A-Z])[A-Z]*\3[A-Z]*\4)")

    while True:
        try:
            word = input("> ")
            print(pattern1.search(word))
            print(pattern2.search(word))
            print(pattern3.search(word))
            print(pattern.search(word))
        except Exception as error:
            print(error)

Interactive session:

> ABC    # Matches no pattern
None
None
None
None

> ABCBA  # Matches no pattern
None
None
None
None

> ABBC   # Matches p2
None
<_sre.SRE_Match object; span=(1, 3), match='BB'> # p2 is matched
None
<_sre.SRE_Match object; span=(1, 3), match='BB'> # Jointure gives the same match

> ABACADA # Matches p3
None
None
<_sre.SRE_Match object; span=(0, 7), match='ABACADA'> # p3 is matched
<_sre.SRE_Match object; span=(0, 7), match='ABACADA'> # Jointure gives the same match

131

answered Oct 06 '22 19:10

Right leg

Related questions
                            
                                Tweepy.cursor multiple / OR logic function for query terms
                            
                                Python, Bokeh: How to turn off auto-update of axes
                            
                                Declaring new variables inside class methods
                            
                                How to optimize a sklearn pipeline, using XGboost, for a different `eval_metric`?
                            
                                Unexpected 32-bit integer overflow in pandas/numpy int64 (python 3.6)
                            
                                import matplotlib failing on Heroku
                            
                                Save a pivottablejs figure to file
                            
                                Broadcast 1D array against 2D array for lexsort : Permutation for sorting each column independently when considering yet another vector
                            
                                how to pipe multiple sql- and py-scripts
                            
                                Adding to sqlalchemy mapping class non db attributes
                            
                                Windows 10 conda is not recognized as an internal or external command
                            
                                Passing a list as a url value to urlopen
                            
                                django.core.exceptions.ValidationError: ["'' is not a valid UUID."]
                            
                                Select and modify a slice in pandas dataframe by integer index
                            
                                Correct use of PEP 508 environment markers in setup.cfg
                            
                                Scrapy Splash Screenshots?
                            
                                Speeding up calculation of nearby groups?
                            
                                Python mix-in enumerations as dictionary key: how the type is converted?
                            
                                Improving algorithm that uses the cartesian product
                            
                                How to stream opencv frame with django frame in realtime?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python regular expression: match either one of several regular expressions

Tags:

python

regex

Test cases:

jdhao

People also ask

1 Answers

Right leg

Recent Activity

Donate For Us