Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression that generates a DFA with dead or superfluous states

Tags:

regex

dfa

nfa

I'm looking to implement a DFA minimizer in my lexer, but I can't seem to produce a DFA that doesn't look like it's already the minimal DFA for the expression.

I'm constructing the DFA from a NFA that is built using thomson construction from a postfix regular expression. It's pretty much exactly what is being described in the dragon book. To make the lexer several of the NFAs are combined using epsilon transitions from the start state. It's on this combined NFA that the DFA algorithm is applied.

So, is there any "known" regular expression that will generate a DFA which will make a nice test bed for dead state elimination and state minimization?

I could of course just hack up a weird DFA and apply the algorithms on it, but it would not really be a proper test case would it? If it's so that the method I'm constructing DFAs isn't prone to dead states, then that information would be just as valueable, since then I can skip implementing the state elimination feature altogether.

Edit: In case you need implementation details in order to accurately answer, the code is available on github, specifically the NFA.cs and DFA.cs classes. Additionally I wrote a series on blog posts on the construction algorithm I'm using, if that helps.

like image 468
Dervall Avatar asked Feb 20 '12 10:02

Dervall


1 Answers

Ok, so I found this out in a totally roundabout way. I made a tool for visualizing regular expression since I got quite a nice debug output from my parser. This aptly illustrates such an expression that using standard thompson construction techniques will give you a pretty stupid automata: (a+b+c+)+|abc

Shown in the tool: http://regexvisualizer.apphb.com/?Regex=%28a%2Bb%2Bc%2B%29%2B%7Cabc&NfaSize=300&DfaSize=250#

This tool currently performs a straight up thompson construction without any optimization. If you remove the |abc part of the expression which is entirely superfluous the expression should stay the same. It doesn't.

like image 178
Dervall Avatar answered Sep 23 '22 07:09

Dervall