I'm new to regular expressions, I've been able to write a few through trial and error so tried a few programs to help me write the expression but the programs were harder to understand than the regular expressions themselves. Any recommended programs? I do most of my programming under Linux.
Try YAPE::Regex::Explain for Perl:
#!/usr/bin/perl
use strict;
use warnings;
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(
qr/^\A\w{2,5}0{2}\S \n?\z/i
)->explain;
Output:
The regular expression: (?i-msx:^\A\w{2,5}0{2}\S \n?\z) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?i-msx: group, but do not capture (case-insensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- \A the beginning of the string ---------------------------------------------------------------------- \w{2,5} word characters (a-z, A-Z, 0-9, _) (between 2 and 5 times (matching the most amount possible)) ---------------------------------------------------------------------- 0{2} '0' (2 times) ---------------------------------------------------------------------- \S non-whitespace (all but \n, \r, \t, \f, and " ") ---------------------------------------------------------------------- ' ' ---------------------------------------------------------------------- \n? '\n' (newline) (optional (matching the most amount possible)) ---------------------------------------------------------------------- \z the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
RegexPal is a great, free JavaScript regex tester. Because it uses the JavaScript regex engine, it doesn't have some of the more advanced regex features, but it works pretty well for a lot of regular expressions. The feature I miss most is lookbehind assertions.
Most regex bugs fall into three categories:
Subtle Omissions - leaving out '^
' at the start or '$
' at the end, using '*
' where you should have used '+
' - these are just beginner mistakes, but its common for the buggy regex to still pass all of the automated tests.
Accidental success - where part of the regex is just completely wrong and is destined to fail in 99% of real world use, but by sheer dumb luck it manages to pass the half-dozen automated tests you wrote.
Too much success - where one part of the regex matches a whole lot more than you thought. For example, the token [^., ]*
will also match \r
and \n
, meaning that your regex can now match multiple lines of text even though you wrapped it in ^
and $
.
There really is no substitute for properly learning regex. Read the reference manual on your regex engine, and use a tool like Regex Buddy to experiment and familiarize yourself with all of the features and especially take note of any special or unusual behaviours they can exhibit. If you learn regex properly, you will avoid most of the bugs mentioned above, and you will know how to write just a small number of automated tests which can guarantee all of the edge cases without over-testing obvious things (does [A-Z]
really match every letter between A and A? I'd better write 26 variations of the unit test to make sure!).
If you don't learn regex completely, you will need to write a ridiculous amount of automated tests to prove that your magical regex is correct.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With