Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recommended method for testing regular expressions?

Tags:

regex

linux

I'm new to regular expressions, I've been able to write a few through trial and error so tried a few programs to help me write the expression but the programs were harder to understand than the regular expressions themselves. Any recommended programs? I do most of my programming under Linux.

like image 352
Roberto Rosario Avatar asked Oct 06 '09 22:10

Roberto Rosario


3 Answers

Try YAPE::Regex::Explain for Perl:

#!/usr/bin/perl

use strict;
use warnings;

use YAPE::Regex::Explain;

print YAPE::Regex::Explain->new(
    qr/^\A\w{2,5}0{2}\S \n?\z/i
)->explain;

Output:

The regular expression:

(?i-msx:^\A\w{2,5}0{2}\S \n?\z)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?i-msx:                 group, but do not capture (case-insensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  \A                       the beginning of the string
----------------------------------------------------------------------
  \w{2,5}                  word characters (a-z, A-Z, 0-9, _)
                           (between 2 and 5 times (matching the most
                           amount possible))
----------------------------------------------------------------------
  0{2}                     '0' (2 times)
----------------------------------------------------------------------
  \S                       non-whitespace (all but \n, \r, \t, \f,
                           and " ")
----------------------------------------------------------------------
                           ' '
----------------------------------------------------------------------
  \n?                      '\n' (newline) (optional (matching the
                           most amount possible))
----------------------------------------------------------------------
  \z                       the end of the string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
like image 109
Sinan Ünür Avatar answered Oct 06 '22 21:10

Sinan Ünür


RegexPal is a great, free JavaScript regex tester. Because it uses the JavaScript regex engine, it doesn't have some of the more advanced regex features, but it works pretty well for a lot of regular expressions. The feature I miss most is lookbehind assertions.

like image 44
Shawn Avatar answered Oct 06 '22 19:10

Shawn


Most regex bugs fall into three categories:

  • Subtle Omissions - leaving out '^' at the start or '$' at the end, using '*' where you should have used '+' - these are just beginner mistakes, but its common for the buggy regex to still pass all of the automated tests.

  • Accidental success - where part of the regex is just completely wrong and is destined to fail in 99% of real world use, but by sheer dumb luck it manages to pass the half-dozen automated tests you wrote.

  • Too much success - where one part of the regex matches a whole lot more than you thought. For example, the token [^., ]* will also match \r and \n, meaning that your regex can now match multiple lines of text even though you wrapped it in ^ and $.

There really is no substitute for properly learning regex. Read the reference manual on your regex engine, and use a tool like Regex Buddy to experiment and familiarize yourself with all of the features and especially take note of any special or unusual behaviours they can exhibit. If you learn regex properly, you will avoid most of the bugs mentioned above, and you will know how to write just a small number of automated tests which can guarantee all of the edge cases without over-testing obvious things (does [A-Z] really match every letter between A and A? I'd better write 26 variations of the unit test to make sure!).

If you don't learn regex completely, you will need to write a ridiculous amount of automated tests to prove that your magical regex is correct.

like image 41
too much php Avatar answered Oct 06 '22 20:10

too much php