Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a NFA to Regular Expression

Tags:

regex

nfa

I found a same question on this website, and the answer was a PDF describing how to convert an NFA to a regex. But this is not working because this method has some conditions:

  1. There are transitions going from the initial state to all other states, and there are no transitions into the initial state.
  2. There is a single accept state that has only transitions coming into it (and no outgoing transitions).
  3. The accept state is distinct from the initial state.
  4. Except for the initial and accepting states, all other states are connected to all other states via a transition. In particular, each state has a transition to itself.

And in my example the start state is just going to the next state but not to all states (e.g q0 goes to q1 but not to q2, q3), and there are transitions into the start state.

So what is the easiest way to convert an NFA to regex? I am not giving an NFA example becuase I don't have a spesific one, is just a general question, because I come across with this kind of DFA where the start state is not connected with all the states, and the are transitions into the start state.

I want a general algorithm to convert this kind of NFAs.

like image 949
Dr. Programmer Avatar asked Oct 03 '22 09:10

Dr. Programmer


1 Answers

The answer is assuming those conditions, because any NFA can be modified to fit those requirements.

For any kind of NFA, you can add a new initial state q0 that has an epsilon-transition to the original initial state, and also using an additional transition symbol called ∅ (they call it empty set symbol, assumed to be a symbol which does not match any symbol from the original NFA) from it to any other states, then use this new state as the new initial state. Note that this does not change the language accepted by the original NFA. This would make your NFA satisfies the first condition.

For any kind of NFA, you can add a new acceptance state qa that has an epsilon-transition from all acceptance state in the original NFA. Then mark this as the only acceptance state. Note that this does not change the language accepted by the original NFA. This would make your NFA satisfies the second condition.

By the above construction, by setting q0 != qa, it satisfies the third condition.

And in the link you provided, the fourth condition is explained by having a special transition symbol called ∅ (the empty set symbol) for which no actual alphabet from original NFA can match. So you can add transitions with this new symbol from every state to any other state. Note that this does not change the language accepted by the original NFA.

So now the NFA has been modified to satisfies the four requirements, you can apply the algorithm there to convert the NFA into Regular Expression, which would accept the same language as the original NFA.

Edit to answer further question:

To answer your question in the comment, consider the NFA with two states, qA and qB. qA is the initial state as well as the only acceptance state. We have a transition from qA to itself with symbol 0,1. We also have transition from qA to qB with symbol 1. Lastly we have transition from qB to qA with symbol 0.

Visualization:

 0,1    
  |  1
->qA----->qB
  ^       |
  |-------|
     0

Step 2. When we normalize the NFA, just put the new init state (qinit) that points to qA, and put a new acceptance state (qacc) from qA.

Step 3. We want to remove qA. So qA is the qrip in the algorithm (in page 3). Now we need to consider every states that enters qA and every states that exits from qA. In this case, there are two states pointing to qA, that are qinit and qB. There are two states that are pointed to by qA, that are qB and qacc. By the algorithm, we replace the transitions qin->qrip->qout with a transition qin->qout, having the transition symbol Rdir+Rin(Rrip)*Rout, where:

  1. Rdir is the original transition from qin to qout
  2. Rin is the original transition from qin to qrip
  3. Rrip is the original loop at qrip
  4. Rout is the original transition from qrip to qout

So in this case we replace the transition qinit->qA->qB with qinit->qB with transition symbol (0+1)*1. Continuing this process, we will create in total 4 new transitions:

  1. qinit->qB: (0+1)*1
  2. qinit->qacc: (0+1)*
  3. qB->qB: 0(0+1)*1
  4. qB->qacc: 0(0+1)*

Then we can remove qA.

Step 4. We want to remove qB. Again, we identify the qin and qout. There is only one state coming to qB here, which is qinit, and there is only one state departing from qB, which is qacc. So we have:

  1. Rdir = (0+1)*
  2. Rin = (0+1)*1
  3. Rrip = 0(0+1)*1
  4. Rout = 0(0+1)*

So the new transition qinit->qacc will be:

Rdir+Rin(Rrip)*Rout

(0+1)* + (0+1)*1 (0(0+1)*1)* 0(0+1)*

And we can remove qB.

Step 5. Since every state in the original NFA has been removed, we are done. So the final regex is shown above.

Note that the final regex might not be optimal (and in most cases it won't be optimal), this is expected from the algorithm. Finding the shortest regex for an NFA (or even DFA) in general is very difficult (although for this example it's easy to see that the first component already covers all possible strings)

For completeness, the shortest regex accepting the same language will be:

(0+1)*

like image 126
justhalf Avatar answered Oct 13 '22 10:10

justhalf