Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this JavaScript regex crash in the browser?

To my knowledge [ab] and (a|b) should be equivalent in purpose when trying to match against a set of characters. Now, look at two regex:

/^(\s|\u00A0)+|(\s|\u00A0)+$/g
/^[\s\u00A0]+|[\s\u00A0]+$/g

They should both match against whitespaces at the beginning and end of a string (See section on Polyfill here for more info on the regex itself). When using square brackets all things work well, but when you switch to parenthesis even the simplest of strings causes the browser to run seemingly indefinitely. This happens on latest Chrome and Firefox.

This jsfiddle demonstrates this:

a ="a                                                               b";

// Doesn't work
// alert(a.replace(/^(\s|\u00A0)+|(\s|\u00A0)+$/g,''));
// Works
alert(a.replace(/^[\s\u00A0]+|[\s\u00A0]+$/g,''));

Is this a crazy quirk with the browser's implementation of the regex engine or is there something else about the regex's algorithm which causes this?

like image 931
Parham Avatar asked Mar 15 '23 18:03

Parham


1 Answers

The problem you are seeing is called catastrophic backtracking, as explained here.

First of all, let me simplify and clarify your test case:

a = Array(30).join("\u00a0") + "b";  // A string with 30 consecutive \u00a0
s = Date.now();
t = a.replace(/^(\s|\u00A0)+$/g, '');
console.log(Date.now()-s, a.length);

What's happening is with the second part of the expression: ^(\s|\u00A0)+$. Note that \s matches a number of whitespace characters, including \u00A0 itself. This means both \s and \u00A0 matches each of the 30 \u00A0 characters.

Therefore if you try to match the string with /(\s|\u00A0)+/, you will find that each of the 2^30 different combinations of 30-character whitespace patterns will result in a match. When the regular expression matcher matched the first 30 characters it will try to match end of string ($) and failed, so it backtracks and ends up trying all 2^30 combinations.

Your original string (in jsfiddle, where the one in stackflow is already "normalized" to all spaces) is a \u00a0 \u00a0 ... \u00a0 b with roughly 30 \u00a0 characters, so it took the browser roughly 2^30 effort to complete. It does not hang the browser, but will take a few minutes to complete.

like image 59
Alan Tam Avatar answered Mar 24 '23 05:03

Alan Tam