Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simplify this regular expression

Tags:

regex

simplify

I'm doing some pre-exam exercises for my compilers class, and needed to simplify this regular expression.

(a U b)*(a U e)b* U (a U b)*(b U e)a*

Quite obviously, the e is the empty string, and the U stands for union.

So far, I think one of the (a U b)* can be removed, as the union of a U a = a. However, I can't find any other simplifications, and am not doing so well with the other problems thus far. :(

Any help is appreciated, thanks very much!

like image 632
CompilersBeginner Avatar asked Feb 10 '11 01:02

CompilersBeginner


2 Answers

First translate to an english description of the language:

(a U b)*(a U e)b* U (a U b)*(b U e)a*

Translates to:


Any sequence of as or bs, followed by an optional a, followed by any number of bs.

OR

Any number of as and bs, followed by an optional b, follwed by any number of as


There is a lot of overlap here - at least (a U b)*(a U e) is exactly the same as (a U b)*, because "Any sequence of as and bs" necessarily either ends with an a or epsilon (as any string can end with epsilon) so those groups can be eliminated, leaving

(a U b)*b* U (a U b)*a*

Translates to:


Any sequence of as or bs, followed by any number of bs.

OR

Any number of as and bs, follwed by any number of as


Now the first section of those to outermost groups is the same, so lets collapse those into one

(a U b)*(a* U b*)

Translates to:


Any sequence of as or bs, followed by any number of as OR by any number bs.


now hold on a minute, "Any sequence of As and Bs" necessarily ends with "Any sequence of as OR any sequence of bs", which means anything which matches the first part can match the whole regex (because the second part can have a length of zero) so why don't we just make it

(a U b)*

Ta Da. Simple.

like image 151
tobyodavies Avatar answered Sep 28 '22 17:09

tobyodavies


Little rusty on regex, but if * still represents the "zero or more ocurrences" you can replace:

(a U e)b* for (a U b)*

which leaves the first part with:

(a U b)*(a U b)* = (a U b)*

On the right side, you have that

(b U e)a* = (b U a)*

Now, since a U b = b U a, you get:

(a U b)*(a U b)*

on the right hand side, which leaves just

(a U b)* U (a U b)* = (a U b)*

I think that's it...

like image 26
Piva Avatar answered Sep 28 '22 18:09

Piva