I'm writing an app that lets a user specify a regular expression. Of course, users make mistakes, so I need a way to handle regular expressions that are unparseable, and give the user some actionable advice on how to fix the problem.
The problem I'm having is that the exceptions thrown by new RegExp("something awful")
are not helpful for regex n00bs, and have different messages per browser. For example:
Given:
try{
new RegExp("(pie");
}catch(e){
console.log(e.message);
}
And it wouldn't surprise me if those message strings are user-language-localized, or that they've drifted over time, making this a crazy knot to untie with exception.message.
My goal is to catch the exception, figure out what it's really about, and put up a much more beginner-friendly message. (And eventually highlighting the unmatched paren, in this example.)
Is there some other exception identifier I should be using? Is there a better way to tell these apart? Failing all of that, has anyone just collected what all these strings are across the several most popular browsers?
Use PEG.js or JISON to create a regular expression parser. You'll be able to get specific and consistent errors.
This file has a YACC grammar for a regular expression: http://swtch.com/usr/local/plan9/src/cmd/grep/grep.y; it might not be too hard to use it with JISON.
A BNF grammar for PERL regex: http://www.cs.sfu.ca/~cameron/Teaching/384/99-3/regexp-plg.html
Idea: Figure it all out at runtime. E.g.
var tellMeWhatIDidWrong = (function() {
var tests = {
'(': 'You did not close your group... duh!',
')': 'You seem to have an unmatched parenthesis.',
'*': 'That token is illegal in that position'
};
var errors = {};
for (var i in tests) {
try { RegExp(i); } catch(e) {
errors[String(e).split(':').pop()] = tests[i];
}
}
return function(regexStr) {
try { RegExp(regexStr); } catch(e) {
e = String(e).split(':').pop();
if (e in errors) {
return errors[e];
}
return 'Unknown error';
}
return 'Nothing -- it is fine!';
};
}());
tellMeWhatIDidWrong('(abc?'); // -> "You did not close your group... duh!"
Of course, this will only work well if a browser's in-built error reporting is specific enough. Many of them suck. E.g. Opera gives absolutely no hint as to the issue, so the above won't work well, and neither will any other solution relying on Opera's native error messages.
I would suggest sending regexps off to an app running node.js and getting the nice V8 error messages :)
Following from my comment, I have hacked together a little script to "harvest" the possible error messages and the patterns that cause them.
JSFiddle (tried on Chrome only, I hope the RegExp exception objects have the same structure for other browsers)
The idea is this: You have a working regular expression that uses as many regex features as possible. Then you randomly mutate it (adding, removing or swapping out characters) and try to compile it. You can do this a few thousand times, and collect all the error messages. Hopefully chance is better at coming up with possible malformed patterns than anyone of us is.
You should definitely improve the base pattern, to include all regex features provided by JavaScript and include all meta characters in the replacement table. But otherwise, I seem to consistently get 6 possible error messages:
Unterminated group
Invalid group
Nothing to repeat
Unmatched ')'
Unterminated character class
\ at end of pattern
Try running this script in different browser, analyze the patterns that caused the errors, and from there you should be able to write your tool.
EDIT:
Okay, as I feared this does not work in other browsers out of the box, because they store the actual message somewhere else inside the exception object. But judging from your question you already seem to have figured out, where to get the message from for every browser, so the changes you need to make should be minor, I hope.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With