Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Steps to creating an NFA from a regular expression

I'm having issues 'describing each step' when creating an NFA from a regular expression. The question is as follows:

Convert the following regular expression to a non-deterministic finite-state automaton (NFA), clearly describing the steps of the algorithm that you use: (b|a)*b(a|b)

I've made a simple 3-state machine but it's very much from intuition. This is a question from a past exam written by my lecturer, who also wrote the following explanation of Thompson's algorithm: http://www.cs.may.ie/staff/jpower/Courses/Previous/parsing/node5.html

Can anyone clear up how to 'describe each step clearly'? It just seems like a set of basic rules rather than an algorithm with steps to follow.

Maybe there's an algorithm I've glossed over somewhere but so far I've just created them with an educated guess.

like image 326
kiliki Avatar asked Aug 05 '12 18:08

kiliki


People also ask

How will you construct a non-deterministic finite automata from a regular expression?

Method. Step 1 Construct an NFA with Null moves from the given regular expression. Step 2 Remove Null transition from the NFA and convert it into its equivalent DFA. It is an NDFA corresponding to the RE − 1 (0 + 1)* 0.

How can create DFA in regex?

Utility – To construct DFA from a given regular expression, we can first construct an NFA for the given expression and then convert this NFA to DFA by a subset construction method. But to avoid this two-step procedure, the other way round is to directly construct a DFA for the given expression.


2 Answers

Short version for general approach.
There's an algo out there called the Thompson-McNaughton-Yamada Construction Algorithm or sometimes just "Thompson Construction." One builds intermediate NFAs, filling in the pieces along the way, while respecting operator precedence: first parentheses, then Kleene Star (e.g., a*), then concatenation (e.g., ab), followed by alternation (e.g., a|b).

Here's an in-depth walkthrough for building (b|a)*b(a|b)'s NFA

Building the top level

  1. Handle parentheses. Note: In actual implementation, it can make sense to handling parentheses via a recursive call on their contents. For the sake of clarity, I'll defer evaluation of anything inside of parens.

  2. Kleene Stars: only one * there, so we build a placeholder Kleene Star machine called P (which will later contain b|a). Intermediate result:
    Non-Deterministic Finite Automata for P*

  3. Concatenation: Attach P to b, and attach b to a placeholder machine called Q (which will contain (a|b). Intermediate result:
    Non-Deterministic Finite Automata for P*bQ

  4. There's no alternation outside of parentheses, so we skip it.

Now we're sitting on a P*bQ machine. (Note that our placeholders P and Q are just concatenation machines.) We replace the P edge with the NFA for b|a, and replace the Q edge with the NFA for a|b via recursive application of the above steps.


Building P

  1. Skip. No parens.

  2. Skip. No Kleene stars.

  3. Skip. No contatenation.

  4. Build the alternation machine for b|a. Intermediate result:
    NFA for b or a


Integrating P

Next, we go back to that P*bQ machine and we tear out the P edge. We have the source of the P edge serve as the starting state for the P machine, and the destination of the P edge serve as the destination state for the P machine. We also make that state reject (take away its property of being an accept state). The result looks like this:
enter image description here


Building Q

  1. Skip. No parens.

  2. Skip. No Kleene stars.

  3. Skip. No contatenation.

  4. Build the alternation machine for a|b. Incidentally, alternation is commutative, so a|b is logically equivalent to b|a. (Read: skipping this minor footnote diagram out of laziness.)


Integrating Q

We do what we did with P above, except replacing the Q edge with the intermedtae b|a machine we constructed. This is the result:
enter image description here

Tada! Er, I mean, QED.


Want to know more?

All the images above were generated using an online tool for automatically converting regular expressions to non-deterministic finite automata. You can find its source code for the Thompson-McNaughton-Yamada Construction algorithm online.

The algorithm is also addressed in Aho's Compilers: Principles, Techniques, and Tools, though its explanation is sparse on implementation details. You can also learn from an implementation of the Thompson Construction in C by the excellent Russ Cox, who described it some detail in a popular article about regular expression matching.

like image 112
Wayland Smith Avatar answered Sep 20 '22 11:09

Wayland Smith


In the GitHub repository below, you can find a Java implementation of Thompson's construction where first an NFA is being created from the regex and then an input string is being matched against that NFA:

https://github.com/meghdadFar/regex

like image 25
MAZDAK Avatar answered Sep 21 '22 11:09

MAZDAK