I found a same question on this website, and the answer was a PDF describing how to convert an NFA to a regex. But this is not working because this method has some conditions:
And in my example the start state is just going to the next state but not to all states (e.g q0 goes to q1 but not to q2, q3), and there are transitions into the start state.
So what is the easiest way to convert an NFA to regex? I am not giving an NFA example becuase I don't have a spesific one, is just a general question, because I come across with this kind of DFA where the start state is not connected with all the states, and the are transitions into the start state.
I want a general algorithm to convert this kind of NFAs.
The answer is assuming those conditions, because any NFA can be modified to fit those requirements.
For any kind of NFA, you can add a new initial state q0 that has an epsilon-transition to the original initial state, and also using an additional transition symbol called ∅ (they call it empty set symbol, assumed to be a symbol which does not match any symbol from the original NFA) from it to any other states, then use this new state as the new initial state. Note that this does not change the language accepted by the original NFA. This would make your NFA satisfies the first condition.
For any kind of NFA, you can add a new acceptance state qa that has an epsilon-transition from all acceptance state in the original NFA. Then mark this as the only acceptance state. Note that this does not change the language accepted by the original NFA. This would make your NFA satisfies the second condition.
By the above construction, by setting q0 != qa, it satisfies the third condition.
And in the link you provided, the fourth condition is explained by having a special transition symbol called ∅ (the empty set symbol) for which no actual alphabet from original NFA can match. So you can add transitions with this new symbol from every state to any other state. Note that this does not change the language accepted by the original NFA.
So now the NFA has been modified to satisfies the four requirements, you can apply the algorithm there to convert the NFA into Regular Expression, which would accept the same language as the original NFA.
Edit to answer further question:
To answer your question in the comment, consider the NFA with two states, qA and qB. qA is the initial state as well as the only acceptance state. We have a transition from qA to itself with symbol 0,1. We also have transition from qA to qB with symbol 1. Lastly we have transition from qB to qA with symbol 0.
Visualization:
0,1 | 1 ->qA----->qB ^ | |-------| 0
Step 2. When we normalize the NFA, just put the new init state (qinit) that points to qA, and put a new acceptance state (qacc) from qA.
Step 3. We want to remove qA. So qA is the qrip in the algorithm (in page 3). Now we need to consider every states that enters qA and every states that exits from qA. In this case, there are two states pointing to qA, that are qinit and qB. There are two states that are pointed to by qA, that are qB and qacc. By the algorithm, we replace the transitions qin->qrip->qout with a transition qin->qout, having the transition symbol Rdir+Rin(Rrip)*Rout, where:
So in this case we replace the transition qinit->qA->qB with qinit->qB with transition symbol (0+1)*1. Continuing this process, we will create in total 4 new transitions:
Then we can remove qA.
Step 4. We want to remove qB. Again, we identify the qin and qout. There is only one state coming to qB here, which is qinit, and there is only one state departing from qB, which is qacc. So we have:
So the new transition qinit->qacc will be:
Rdir+Rin(Rrip)*Rout
(0+1)* + (0+1)*1 (0(0+1)*1)* 0(0+1)*
And we can remove qB.
Step 5. Since every state in the original NFA has been removed, we are done. So the final regex is shown above.
Note that the final regex might not be optimal (and in most cases it won't be optimal), this is expected from the algorithm. Finding the shortest regex for an NFA (or even DFA) in general is very difficult (although for this example it's easy to see that the first component already covers all possible strings)
For completeness, the shortest regex accepting the same language will be:
(0+1)*
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With