I'm just wondering if it's possible to use one regular expression to match another, that is some sort of:
['a-z'].match(['b-x'])
True
['m-n'].match(['0-9'])
False
Is this sort of thing possible with regex at all? I'm doing work in python, so any advice specific to the re
module's implementation would help, but I'll take anything I can get concerning regex.
Edit: Ok, some clarification is obviously in order! I definitely know that normal matching syntax would look something like this:
expr = re.compile(r'[a-z]*')
string = "some words"
expr.match(string)
<sRE object blah blah>
but I'm wondering if regular expressions have the capability to match other, less specific expressions in the non-syntacticly correct version I tried to explain with above, any letter from b-x would always be a subset (match) of any letter from a-z. I know just from trying that this isn't something you can do by just calling the match of one compiled expression on another compiled expression, but the question remains: is this at all possible?
Let me know if this still isn't clear.
We say that two regular expressions R and S are equivalent if they describe the same language. In other words, if L(R) = L(S) for two regular expressions R and S then R = S. Examples.
The plus sign + is a greedy quantifier, which means one or more times. For example, expression X+ matches one or more X characters. Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
No, if you use standard regular expressions. The reason is that you cannot satisfy the pumping lemma for regular languages.
I think — in theory — to tell whether regexp A
matches a subset of what regexp B
matches, an algorithm could:
B
and also of the "union" A|B
.However, it would likely be a major project to do this in practice. There are explanations such as Constructing a minimum-state DFA from a Regular Expression but they only tend to consider mathematically pure regexps. You would also have to handle the extensions that Python adds for convenience. Moreover, if any of the extensions cause the language to be non-regular (I am not sure if this is the case) you might not be able to handle those ones.
But what are you trying to do? Perhaps there's an easier approach...?
REGEX_A: 55* [This matches "5", "55", "555" etc. and does NOT match "4" , "54" etc]
REGEX_B: 5* [This matches "", "5" "55", "555" etc. and does NOT match "4" , "54" etc]
[Here we've assumed that 55* is not implicitly .55.* and 5* is not .5.* - This is why 5* does not match 4]
{A}--5-->{B}--epsilon-->{C}--5-->{D}--epsilon-->{E}
{B} -----------------epsilon --------> {E}
{C} <--- epsilon ------ {E}
{A}--epsilon-->{B}--5-->{C}--epsilon-->{D}
{A} --------------epsilon -----------> {D}
{B} <--- epsilon ------ {D}
NFA:
{state A} ---epsilon --> {state B} ---5--> {state C} ---5--> {state D}
{state C} ---epsilon --> {state D}
{state C} <---epsilon -- {state D}
{state A} ---epsilon --> {state E} ---5--> {state F}
{state E} ---epsilon --> {state F}
{state E} <---epsilon -- {state F}
NFA -> DFA:
| 5 | epsilon*
----+--------------+--------
A | B,C,E,F,G | A,C,E,F
B | C,D,E,F | B,C,E,F
c | C,D,E,F | C
D | C,D,E,F,G | C,D,E,F
E | C,D,E,F,G | C,E,F
F | C,E,F,G | F
G | C,D,E,G | C,E,F,G
5(epsilon*)
-------------+---------------------
A | B,C,E,F,G
B,C,E,F,G | C,D,E,F,G
C,D,E,F,G | C,D,E,F,G
Finally the DFA for (REGEX_A|REGEX_B) is:
{A}--5--->{B,C,E,F,G}--5--->{C,D,E,F,G}
{C,D,E,F,G}---5--> {C,D,E,F,G}
Note: {A} is start state and {C,D,E,F,G} is accepting state.
| 5 | epsilon*
----+--------+--------
A | B,C,E | A
B | C,D,E | B,C,E
C | C,D,E | C
D | C,D,E | C,D,E
E | C,D,E | C,E
5(epsilon*)
-------+---------------------
A | B,C,E
B,C,E | C,D,E
C,D,E | C,D,E
{A} ---- 5 -----> {B,C,E}--5--->{C,D,E}
{C,D,E}--5--->{C,D,E}
Note: {A} is start state and {C,D,E} is accepting state
| 5 | epsilon*
----+--------+--------
A | B,C,D | A,B,D
B | B,C,D | B
C | B,C,D | B,C,D
D | B,C,D | B,D
5(epsilon*)
-------+---------------------
A | B,C,D
B,C,D | B,C,D
{A} ---- 5 -----> {B,C,D}
{B,C,D} --- 5 ---> {B,C,D}
Note: {A} is start state and {B,C,D} is accepting state
DFA of REGX_A|REGX_B identical to DFA of REGX_A
-- implies REGEX_A is subset of REGEX_B
DFA of REGX_A|REGX_B is NOT identical to DFA of REGX_B
-- cannot infer about either gerexes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With