I'm developing a software to generate a Turing Machine from a regular expression. In other words, I want to take a regular expression as input, and programmatically generate a Turing Machine to perform the same task.
This is what I have done:
I've modelled the regular expression as follows:
RegularExpression (interface): the classes below implements this interface
Simple (ie: aaa, bbb, abcde): this is a leaf class; it does not have any subexpressions
ComplexWithoutOr (ie: a(ab)*, (a(ab)*c(b)*)*): this class contains a list of RegularExpression.
ComplexWithOr (exampe: a(a|b)*, (a((ab)*|c(b)*)*): this class contains an Or operation, which contains a list of RegularExpression. It represents the a|b part of the first example and the (ab)*|c(b)* of the second one.
Variable (example: awcw, where w ∈ {a,b}*): this is not yet implemented, but the idea is to model it as a leaf class with some different logic from Simple. It represents the w part of the examples.
When it comes to Turing machine (TM) generation, I have different levels of complexity:
Simple: this type of expression is already working. Generates a new state for each letter and moves right. If in any state, the letter read is not the expected, it starts a "rollback circuit" that finishes with the TM head in the initial position and stops in a not final state.
ComplexWithoutOr: here comes my problem. The algorithm generates a TM for each subexpression and concatenates them. This works for some simple cases, but I have problems with the rollback mechanism.
Here is an example input for which my algorithm fails:
(ab)*abac: this is a ComplexWithoutOr expression that contains a ComplexWithOr expression (ab)* (that has a Simple expression inside: ab) and a Simple expression abac
My algorithm generates first an TM1 for ab. This TM1 is used by TM2 for (ab)*, so if TM1 succeeds, it enters again in TM1, otherwise TM1 rolls back and TM2 finishes with success. In other words, TM2 cannot fail.
Then, it generates a TM3 for abac. The output of TM2 is the input of TM3. The output of TM3 is the result of the execution.
Now, suppose this input string: abac. As you can see it matches with the regular expression. So let's see what happens when the TM is executed.
TM1 executes with success the first time: ab. TM1 fails the second time for ac and rolls back, putting the TM head in the 3rd position at a. TM2 finishes with success and the input is forwarded to TM3. TM3 fails, because ac doesn't match abac. So the TM does not recognize abac.
This is a problem. How to solve this?
I'm using Java to develop it, but the language it is not important. My question concerns the algorithm.
It is not entirely clear to me what exactly you are trying to implement. It looks like you want to make a Turing Machine (or any FSM in general) that accepts only those strings that are also accepted by the regular expression. In effect, you want to convert a regular expression to a FSM.
Actually that is exactly what a real regex matcher does under the hood. I think this series of articles by Russ Cox covers a lot of what you want to do.
Michael Sipser, in Introduction to the Theory of Computation, proves in chapter 1 that regular expressions are equivalent to finite automata in their descriptive power. Part of the proof involves constructing a nondeterministic finite automaton (NDFA) that recognizes the language described by a specific regular expression. I'm not about to copy half that chapter, which would be quite hard due to the notation used, so I suggest you borrow or purchase the book (or perhaps a Google search using these terms will turn up a similar proof) and use that proof as the basis for your algorithm.
As Turing machines can simulate an NDFA, I assume an algorithm to produce an NDFA is good enough.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With