I'm just wondering if it's possible to use one regular expression to match another, that is some sort of: <pre class="prettyprint"><code>['a-z'].match(['b-x']) True ['m-n'].match(['0-9']) False </code></pre> Is this sort of thing possible with regex at all? I'm doing work in python, so any advice specific to the <code>re</code> module's implementation would help, but I'll take anything I can get concerning regex. Edit: Ok, some clarification is obviously in order! I definitely know that normal matching syntax would look something like this: <pre class="prettyprint"><code>expr = re.compile(r'[a-z]*') string = "some words" expr.match(string) <sRE object blah blah> </code></pre> but I'm wondering if regular expressions have the capability to match other, less specific expressions in the non-syntacticly correct version I tried to explain with above, any letter from b-x would always be a subset (match) of any letter from a-z. I know just from trying that this isn't something you can do by just calling the match of one compiled expression on another compiled expression, but the question remains: is this at all possible? Let me know if this still isn't clear.

I think — in theory — to tell whether regexp <code>A</code> matches a subset of what regexp <code>B</code> matches, an algorithm could: <ol> <li>Compute the minimal Deterministic Finite Automaton of <code>B</code> and also of the "union" <code>A|B</code>.</li> <li>Check if the two DFAs are identical. This is true if and only if A matches a subset of what B matches.</li> </ol> However, it would likely be a major project to do this in practice. There are explanations such as Constructing a minimum-state DFA from a Regular Expression but they only tend to consider mathematically pure regexps. You would also have to handle the extensions that Python adds for convenience. Moreover, if any of the extensions cause the language to be non-regular (I am not sure if this is the case) you might not be able to handle those ones. But what are you trying to do? Perhaps there's an easier approach...?

How to tell if one regular expression matches a subset of another regular expression?

Tags:

python

regex

I'm just wondering if it's possible to use one regular expression to match another, that is some sort of:

Click to copy

['a-z'].match(['b-x'])
True

['m-n'].match(['0-9'])
False

Is this sort of thing possible with regex at all? I'm doing work in python, so any advice specific to the re module's implementation would help, but I'll take anything I can get concerning regex.

Edit: Ok, some clarification is obviously in order! I definitely know that normal matching syntax would look something like this:

Click to copy

expr = re.compile(r'[a-z]*')
string = "some words"
expr.match(string)
<sRE object blah blah>

but I'm wondering if regular expressions have the capability to match other, less specific expressions in the non-syntacticly correct version I tried to explain with above, any letter from b-x would always be a subset (match) of any letter from a-z. I know just from trying that this isn't something you can do by just calling the match of one compiled expression on another compiled expression, but the question remains: is this at all possible?

Let me know if this still isn't clear.

213

asked Jun 15 '11 19:06

NSU

2 Answers

I think — in theory — to tell whether regexp A matches a subset of what regexp B matches, an algorithm could:

Compute the minimal Deterministic Finite Automaton of B and also of the "union" A|B.
Check if the two DFAs are identical. This is true if and only if A matches a subset of what B matches.

However, it would likely be a major project to do this in practice. There are explanations such as Constructing a minimum-state DFA from a Regular Expression but they only tend to consider mathematically pure regexps. You would also have to handle the extensions that Python adds for convenience. Moreover, if any of the extensions cause the language to be non-regular (I am not sure if this is the case) you might not be able to handle those ones.

But what are you trying to do? Perhaps there's an easier approach...?

114

answered Sep 25 '22 10:09

antinome

Verification of the post by "antinome" using two regex : 55* and 5* :

REGEX_A: 55* [This matches "5", "55", "555" etc. and does NOT match "4" , "54" etc]

REGEX_B: 5* [This matches "", "5" "55", "555" etc. and does NOT match "4" , "54" etc]

[Here we've assumed that 55* is not implicitly .55.* and 5* is not .5.* - This is why 5* does not match 4]

REGEX_A can have an NFA as below:

Click to copy

  {A}--5-->{B}--epsilon-->{C}--5-->{D}--epsilon-->{E}
           {B} -----------------epsilon --------> {E} 
                          {C} <--- epsilon ------ {E}

REGEX_B can have an NFA as below:

Click to copy

  {A}--epsilon-->{B}--5-->{C}--epsilon-->{D}
  {A} --------------epsilon -----------> {D} 
                 {B} <--- epsilon ------ {D}

Now we can derive NFA * DFA of (REGEX_A|REGEX_B) as below:

Click to copy

  NFA:
  {state A}  ---epsilon --> {state B} ---5--> {state C} ---5--> {state D}
                                              {state C} ---epsilon --> {state D} 
                                              {state C} <---epsilon -- {state D}
  {state A}  ---epsilon --> {state E} ---5--> {state F}
                            {state E} ---epsilon --> {state F} 
                            {state E} <---epsilon -- {state F}

  NFA -> DFA:

       |   5          |  epsilon*
   ----+--------------+--------
    A  |  B,C,E,F,G   |   A,C,E,F
    B  |  C,D,E,F     |   B,C,E,F
    c  |  C,D,E,F     |   C
    D  |  C,D,E,F,G   |   C,D,E,F
    E  |  C,D,E,F,G   |   C,E,F
    F  |  C,E,F,G     |   F
    G  |  C,D,E,G     |   C,E,F,G

                    5(epsilon*)
    -------------+---------------------
              A  |  B,C,E,F,G 
      B,C,E,F,G  |  C,D,E,F,G 
      C,D,E,F,G  |  C,D,E,F,G 

    Finally the DFA for (REGEX_A|REGEX_B) is:
         {A}--5--->{B,C,E,F,G}--5--->{C,D,E,F,G}
                                     {C,D,E,F,G}---5--> {C,D,E,F,G}

         Note: {A} is start state and {C,D,E,F,G} is accepting state.

Similarly DFA for REGEX_A (55*) is:

Click to copy

       |   5    |  epsilon*
   ----+--------+--------
    A  | B,C,E  |   A
    B  | C,D,E  |   B,C,E
    C  | C,D,E  |   C
    D  | C,D,E  |   C,D,E
    E  | C,D,E  |   C,E


            5(epsilon*)
   -------+---------------------
       A  |  B,C,E  
   B,C,E  |  C,D,E
   C,D,E  |  C,D,E

    {A} ---- 5 -----> {B,C,E}--5--->{C,D,E}
                                    {C,D,E}--5--->{C,D,E}
Note: {A} is start state and {C,D,E} is accepting state

Similarly DFA for REGEX_B (5*) is:

Click to copy

       |   5    |  epsilon*
   ----+--------+--------
    A  | B,C,D  |   A,B,D
    B  | B,C,D  |   B
    C  | B,C,D  |   B,C,D
    D  | B,C,D  |   B,D


            5(epsilon*)
   -------+---------------------
       A  |  B,C,D  
   B,C,D  |  B,C,D

    {A} ---- 5 -----> {B,C,D}
                      {B,C,D} --- 5 ---> {B,C,D}
Note: {A} is start state and {B,C,D} is accepting state

Conclusions:

Click to copy

DFA of REGX_A|REGX_B identical to DFA of REGX_A 
      -- implies REGEX_A is subset of REGEX_B
DFA of REGX_A|REGX_B is NOT identical to DFA of REGX_B 
      -- cannot infer about either gerexes.

answered Sep 23 '22 10:09

KGhatak

Related questions
                            
                                Enums in Python: How to enforce in method arguments
                            
                                Matplotlib boxplot x axis
                            
                                What is tensorflow.compat.as_str()?
                            
                                matplotlib histogram: how to display the count over the bar?
                            
                                How to change pip installation path
                            
                                How to do a proper upsert using sqlalchemy on postgresql?
                            
                                How do I sign a POST request using HMAC-SHA512 and the Python requests library?
                            
                                Gdal Installation error using pip
                            
                                How to decode encoded data from deep autoencoder in Keras (unclarity in tutorial)
                            
                                Pyinstaller is not recognized as internal or external command
                            
                                TypeError: multiple bases have instance lay-out conflict
                            
                                How can I replace a substring in a Python pathlib.Path?
                            
                                Square detection in image
                            
                                Easy_install cache downloaded files
                            
                                Automatically Generated Python Code from an UML diagram? [closed]
                            
                                Example of subclassing string.Template in Python?
                            
                                Reverse a word in Vim
                            
                                Python deep getsizeof list with contents?
                            
                                Python module for multiple variable global optimization [closed]
                            
                                How to unit test a Python function that draws PDF graphics?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to tell if one regular expression matches a subset of another regular expression?

Tags:

python

regex

NSU

People also ask

2 Answers

antinome

Verification of the post by "antinome" using two regex : 55* and 5* :

REGEX_A can have an NFA as below:

REGEX_B can have an NFA as below:

Now we can derive NFA * DFA of (REGEX_A|REGEX_B) as below:

Similarly DFA for REGEX_A (55*) is:

Similarly DFA for REGEX_B (5*) is:

Conclusions:

KGhatak

Recent Activity

Donate For Us