I'm new to TDD, and I find RegExp quite a particular case. Is there any special way to unit test them, or may I just treat them as regular functions?
To test a regular expression, first search for errors such as non-escaped characters or unbalanced parentheses. Then test it against various input strings to ensure it accepts correct strings and regex wrong ones. A regex tester tool is a great tool that does all of this.
Regular expressions (or regex for short), allow us to define and set rules to check whether a pattern of characters exist in a text string or not. Regular expressions help us to match, locate and manage text, providing a quick and relatively easy way to manipulate data particularly in large complex programmes.
It's a negative lookahead, which means that for the expression to match, the part within (?!...) must not match. In this case the regex matches http:// only when it is not followed by the current host name (roughly, see Thilo's comment).
You should always test your regexen, much like any other chunk of code. They're at the most simple a function that takes a string and returns a bool, or returns an array of values.
Here are some suggestions on what to think about when it comes to designing unit tests for regexen. These are not not hard and fast prescriptions for unit test design, but some guidelines to shape your thinking. As always, weigh the needs of your testing versus cost of failure balanced with the time required to implement them all. (I find that 'implementing' the test is the easy part! :-] )
Points to consider:
For a regex that returns lists, also remember:
(?<name> thing1 ( thing2) )
) - this behavior can be different based on the regex engine you're using.If you use any advanced features, such as non-backtracking groups, make sure you understand completely how the feature works, and using the guidelines above, build example strings that should work for and against each of them.
Depending on your regex library implementation, the way groups are captured may be different as well. Perl 5 has a 'open paren order' ordering, C# has that partially except for named groups and so on. Make sure to experiment with your flavor to know exactly what it does.
Then, integrate them right in with your other unit tests, either in their own module or alongside the module that contains the regex. For particularly nasty regexen, you may find you need lots and lots of tests to verify that the pattern and all the features you use are correct. If the regex makes up a large (or nearly all) of the work that the method is doing, I will use the advice above to fashion inputs to test that function and not the regex directly. That way, if later you decide that the regex is not the way to go, or you want to break it up, you can capture the behavior the regex provided without changing the interface - ie, the method that invokes the regex.
As long as you really know how a regex feature is supposed to work in your flavor of regex, you should be able to develop decent test cases for it. Just make sure you really, really, really do understand how the feature works!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With