In every programming language I've worked with, regular expression support (if it exists) is basically a black box: there are some functions like match
, scan
, etc. that take an expression and return something—often a string, or an array—but they don't report on what they're doing while they're doing it.
I'm wondering if, in any reasonably popular programming language, there is either built-in or library support for matching regular expressions and providing some kind of real-time output (e.g., to standard out) indicating what's happening.
Update: I appreciate the comments so far; however, I'm not asking about a tool that displays the structure of the regular expression itself, which is what debuggex.com and regexper.com appear to do (though that's very cool!). I meant to ask about providing info during the part where the expression is applied to some input.
Here's a hypothetical example: suppose I had the expression "(foo|bar|baz)" and I test this against the string "baz"; then I'm picturing output that might look like...
testing "foo" - nope
testing "bar" - nope
testing "baz" - found match
Obviously it wouldn't look quite like that; but you get the idea.
String operations will always be faster than regular expression operations. Unless, of course, you write the string operations in an inefficient way. Regular expressions have to be parsed, and code generated to perform the operation using string operations.
There are also two types of regular expressions: the "Basic" regular expression, and the "extended" regular expression. A few utilities like awk and egrep use the extended expression. Most use the "basic" regular expression. From now on, if I talk about a "regular expression," it describes a feature in both types.
A regex engine executes the regex one character at a time in left-to-right order. This input string itself is parsed one character at a time, in left-to-right order. Once a character is matched, it's said to be consumed from the input, and the engine moves to the next input character. The engine is by default greedy.
Regular Expressions are efficient in that one line of code can save you writing hundreds of lines. But they're normally slower (even pre-compiled) than thoughtful hand written code simply due to the overhead. Generally the simpler the objective the worse Regular Expressions are. They're better for complex operations.
Several regular expression libraries are written in such a way that you can get state by state processing information. In particular, Russ Cox wrote an article on regular expressions that included bits of code and an API for transitioning state by state:
http://swtch.com/~rsc/regexp/regexp1.html
The code used in the article was expanded into a complete, simple regex library that appears to give step by step output similar to what you described:
https://code.google.com/p/re1/
Later, the code was more fully worked out and is now a full blown regex library maintained (and used internally) by Google:
https://code.google.com/p/re2/
EDIT
If you compile re2 with DebugDFA
set to true in the source code, you will get state by state output during processing. However, for many regex's it may not correspond 1-1 with the actual regular expression, and the output is a little esoteric.
Python's regular expression engine does provide visibility, using the RE.debug flag. You're asking for something different though (realtime feedback) which I'm pretty sure does not exist. I could see it being integrated into an IDE or an enhanced python shell such as ipython. It would be a fun thing to write and quite useful, in my opinion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With