Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I make the regular expression not result in “catastrophic backtracking”?

When I try to run below code in javascript, the browser hangs because of catastrophic backtracking which loops infinitely probably because of a poorly designed regular expression. I need either an alternative expression or a way to prevent this issue:

string temp = "Testing robustness {parent-area-identifier Some text in between the tokens {parent-area-label}";
var strRegExp = new RegExp(/[{](?:[^{}]+|[{][^{}]*[}])*[}]/g);
var arrMatch = temp.match(strRegExp);
like image 972
Nic Avatar asked Feb 05 '16 11:02

Nic


2 Answers

Your regex looks like it's meant to match balanced braces that have more balanced pairs nested within, but only one level deep. This regex does that without hanging on malformed input:

{[^{}]*(?:{[^{}]*}[^{}]*)*}

This is an example of Jeffrey Friedl's unrolled loop technique. When the first [^{}]* runs out of non-brace characters, the next part tries to match a simple, non-nested brace pair, then goes back to looking for non-braces. That part is looped to allow for multiple nested brace pairs (but all at the same level).

This may appear even more vulnerable to catastrophic backtracking (nested quantifiers, everything optional) but it works because it never has to backtrack, even when no match is possible.

As an aside, you don't need to escape braces as long as it doesn't look like you're trying to use them as part of a quantifier. (In some flavors you would need to escape the left brace, but not JavaScript.)

Also, if you want to match braces nested to an unknown depth, you're out of luck. Some flavors can manage that, but JavaScript is way too limited.

like image 134
Alan Moore Avatar answered Sep 28 '22 08:09

Alan Moore


If you want to select an area without curly brackets, try to use this method:

var temp = "{=rankedArea?metricType=3902&area={parent-area-identifier}:AdministrativeWard} {=rankedArea?metricType=3902&area={parent-area-identifier}:{ward-type-identifier}} {district-short-label}  adfasdfasdfasdf asdf asdf asdf asdf {child-area-short-label}  asdf asdf asdf  {authority-area-short-label} asdfasdfasdfasdf asdf  asdfasdfasdfasdf asdf{=compare?metricType=3343&greater=greater than&equal=equal to&less=less than}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf{=countAreas?area={ancestor-2-identifier}:{ancestor-1-type-identifier}}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf{=equivalent?metricDimension=[218][218_Number][Specificethnicity][Ethnicity_AsianorAsianBritish]}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf {=metricTypeMetadata?metricType=3341&returnValue=source}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf{=value?metricType=3284}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf{=percent?metricType=518}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf{=rank?metricType=3287}  asdfasdfasdfasdf asdf asdfasdfasdfasdf asdf{=rankedArea?metricType=3286}  asdfasdfasdfasdf asdf";
var strRegExp = new RegExp(/{(?:[^{}]+|{[^{}]*})*}/g);
var arrMatch = temp.match(strRegExp);
console.log(arrMatch.length);
console.log(arrMatch);

Result:

13
["{=rankedArea?metricType=3902&area={parent-area-identifier}:AdministrativeWard}",
 "{=rankedArea?metricType=3902&area={parent-area-identifier}:{ward-type-identifier}}",
 "{district-short-label}",
 "{child-area-short-label}",
 "{authority-area-short-label}",
 "{=compare?metricType=3343&greater=greater than&equal=equal to&less=less than}",
 "{=countAreas?area={ancestor-2-identifier}:{ancestor-1-type-identifier}}",
 "{=equivalent?metricDimension=[218][218_Number][Specificethnicity][Ethnicity_AsianorAsianBritish]}",
 "{=metricTypeMetadata?metricType=3341&returnValue=source}", "{=value?metricType=3284}",
 "{=percent?metricType=518}",
 "{=rank?metricType=3287}",
 "{=rankedArea?metricType=3286}"]

It works fast, if this algorithm is incorrect, please provide more test cases.

like image 41
Alex Vazhev Avatar answered Sep 28 '22 08:09

Alex Vazhev