I made a simple regular expression engine, which supports concatenation, alternation, closure, and char a .. z
.
The way I represent nfa and dfa is to use record:
type state = int with sexp, compare
type alphabet = char with sexp, compare
type transaction = state * alphabet option * state with sexp, compare
type d_transaction = state * alphabet * state with sexp, compare
type state_set = State_set.t
type states_set = States_set.t
type nfa = {
states : State_set.t ;
alphabets : Alphabet_set.t ;
transactions : Transaction_set.t;
start_state : state;
final_states : State_set.t;
}
type dfa = {
d_states : State_set.t ;
d_alphabets : Alphabet_set.t;
d_transactions : D_Transaction_set.t ;
d_start_state : state ;
d_final_states : State_set.t;
}
For example, a string "a*" will be parsed into Closure (Char 'a')
, and then is transformed
to nfa:
states: 0 1 2 3
alphabets: a
transactions: 0->e->1, 1->a>2, 2->e->3, 2->e->1, 0->e->3
start_state: 0
final_states: 3
then dfa:
states: 0 1
alphabets: a
transactions: 0->a->1, 1->a->1
start_state: 0
final_states: 0 1
However, I use a lot of recursion in my code. The way my program generates state number for each node in nfa and dfa is realy unpredictable. I have no idea how to verify if the generated dfa is correct without using a pen and a piece of paper to test it myself
I am trying to figure out a better way to test my code so I can add more features into my program in the future.
Can anyone please give me some suggestions?
One fairly elaborate plan would be to convert your DFA back into a regular expression, then test whether the result is equivalent to your original regular expression. Here is a SO page giving some ways to test for RE equivalence: Regex: Determine if two regular expressions could match for the same input?
The hope is that the two inverse conversions would help debug each other :-)
Short of formal verification, you could:
EDIT: I think if you want to directly test your DFAs, you may want to write some kind of little specialized "coverage tool" for your specific types, that tells you what fraction of states and/or state/transition pairs you have reached during testing of each DFA, and which ones. This would be some modified form of the function that you currently use to walk your DFAs along input strings.
Disclaimer: I am currently contributing towards improving Bisect_ppx (which is the "modern" fork of Bisect). I'm not affiliated or involved with anything else mentioned here, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With