Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OCaml: How to test my own regular expression library

I made a simple regular expression engine, which supports concatenation, alternation, closure, and char a .. z.

The way I represent nfa and dfa is to use record:

type state       = int with sexp, compare
type alphabet    = char with sexp, compare
type transaction = state * alphabet option * state with sexp, compare
type d_transaction = state * alphabet * state with sexp, compare

type state_set = State_set.t
type states_set = States_set.t

type nfa = {
  states       : State_set.t ;
  alphabets    : Alphabet_set.t ;
  transactions : Transaction_set.t; 
  start_state  : state;
  final_states : State_set.t;
}


type dfa = {
  d_states       : State_set.t ;
  d_alphabets    : Alphabet_set.t;
  d_transactions : D_Transaction_set.t ;
  d_start_state  : state ;
  d_final_states : State_set.t;
}

For example, a string "a*" will be parsed into Closure (Char 'a'), and then is transformed to nfa: states: 0 1 2 3 alphabets: a transactions: 0->e->1, 1->a>2, 2->e->3, 2->e->1, 0->e->3 start_state: 0 final_states: 3

then dfa:

states: 0 1 alphabets: a transactions: 0->a->1, 1->a->1 start_state: 0 final_states: 0 1

However, I use a lot of recursion in my code. The way my program generates state number for each node in nfa and dfa is realy unpredictable. I have no idea how to verify if the generated dfa is correct without using a pen and a piece of paper to test it myself

I am trying to figure out a better way to test my code so I can add more features into my program in the future.

Can anyone please give me some suggestions?

like image 927
齐天大圣 Avatar asked Jan 27 '16 16:01

齐天大圣


2 Answers

One fairly elaborate plan would be to convert your DFA back into a regular expression, then test whether the result is equivalent to your original regular expression. Here is a SO page giving some ways to test for RE equivalence: Regex: Determine if two regular expressions could match for the same input?

The hope is that the two inverse conversions would help debug each other :-)

like image 59
Jeffrey Scofield Avatar answered Nov 15 '22 13:11

Jeffrey Scofield


Short of formal verification, you could:

  1. Use a unit testing library such as OUnit or alcotest to test your engine against a large number of examples. Here is a nice blog post comparing some other testing libraries.
  2. Combine that with a coverage tool such as Bisect_ppx, which serves two purposes: it directly helps make sure that your examples test the various branches in your generator, and also indirectly causes you to look at the generator much more closely and think about how to write examples to test the various code paths. Here is another blog post that briefly compares coverage tools.

EDIT: I think if you want to directly test your DFAs, you may want to write some kind of little specialized "coverage tool" for your specific types, that tells you what fraction of states and/or state/transition pairs you have reached during testing of each DFA, and which ones. This would be some modified form of the function that you currently use to walk your DFAs along input strings.

Disclaimer: I am currently contributing towards improving Bisect_ppx (which is the "modern" fork of Bisect). I'm not affiliated or involved with anything else mentioned here, though.

like image 2
antron Avatar answered Nov 15 '22 13:11

antron