There are several tools how to generate sample data for a given regex. Some include:
However, while they may be sufficient to seed a dataset, it doesn't help much testing code that depends on the regex itself, such as validation.
Assume you have a code generator that generates a model with a property. The user specifies a regex to validate the property. Assume now that the code generator is attempting to generate tests to ensure that validation succeeds and fails appropriately. It seems reasonable for the tool to focus on boundary cases within the regex to avoid generating unnecessary data.
For example, consider a regex ^([a-z]{3,6})$
then the boundary cases include:
The reason focussing on boundary cases is that any string consisting only of [a-z] with a length greater than 6 verifies the upper boundary of the string length defined in the regex. So testing a string of length 7, 8, 9 is really just testing the same (boundary) condition.
This was an arbitrary regex chosen for its simplicity, but any reasonable regex may act as an input.
Does a framework/tools exists that the code generator can use to generate input strings for test cases of the different layers of the systems being generated. The test cases come into their own when the system is no longer generated and modified later in the development cycle.
If I understand your question correctly, you want to generate input for the system based on the validation regex so that you can automate unit testing.
Doesn't this defeat the purpose of unit testing, though? If someone changes the regex, wouldn't you want the validation to fail?
In any case, the simple answer is that generating a string from a regex is all but impossible. If it could be done, it would be extremely complex. For example, consider this regex:
(?<=\G\d{0,3})(?>[a-z]+)(?<=(?<foo>foo)|)(?(foo)(?!))
It is very simple for me to think of a string that would match (and/or generate matches):
abc123def456ghi789jkl123foo456pqr789stu123vwx456yz
The matches would be:
But how would you generate a string from the expression? There is no clear starting point - it takes some extreme (for a computer) intelligence plus a dash of creativity to work out a solution. Something simple for a human, but very, very hard for a computer. Even if you could come up with a computer algorithm that would generate a matching string, it could easily look something like this:
a
This would generate a match, but it does a poor job of exercising the regex. The \d{0,3}
is never really tried and \G
is only ever used to match the beginning of the input (rather than the end of the last match). (?<=(?<foo>foo))
is never tested (and if it was, it would result in a non-match).
It would also be easy to generate a string that does not match:
1
But, again, this doesn't really put the regex through its paces.
I don't know computer theory well enough to prove it, but I believe this falls into the P v NP class of problems. It is relatively easy to generate a regex to match a collection of complex strings, but difficult to generate a collection of complex strings to match a regex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With