Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a theorical expression size limit for "or" operator on Regex.Replace

Tags:

.net

regex

limit

Is there a theorical expression size limit for "or" operator on Regex.Replace such as Regex.Replace("abc","(a|c|d|e...continue say 500000 elements here)","zzz") ?

Any stackoverflowException on .NET's implementation ?

Thanks

like image 633
Bamboo Avatar asked Sep 23 '11 12:09

Bamboo


People also ask

Can I use regex in replace?

How to use RegEx with . replace in JavaScript. To use RegEx, the first argument of replace will be replaced with regex syntax, for example /regex/ . This syntax serves as a pattern where any parts of the string that match it will be replaced with the new substring.

Can you use or in regex?

Alternation is the term in regular expression that is actually a simple “OR”. In a regular expression it is denoted with a vertical line character | . For instance, we need to find programming languages: HTML, PHP, Java or JavaScript.

How do you limit the length of a regular expression?

By combining the interval quantifier with the surrounding start- and end-of-string anchors, the regex will fail to match if the subject text's length falls outside the desired range.

Are regex expressions universal?

No. There are many dialects of regular expressions, though the basic expressions tend to be quite similar.


2 Answers

There is no theoretical limit, though each regular expression engine will have its own implementation limits. In this case, since you are using .NET the limit is due to the amount of memory the .NET runtime can use.

A regular expression with one million alernations works fine for me:

string input = "a<142>c";
var options = Enumerable.Range(0, 1000000).Select(x => "<" + x + ">");
string pattern = string.Join("|", options);
string result = Regex.Replace(input, pattern, "zzz");

Result:

azzzc

It's very slow though. Increasing the number of options to 10 million gives me an OutOfMemoryException.

You probably would benefit from looking at another approach.

like image 74
Mark Byers Avatar answered Sep 23 '22 14:09

Mark Byers


The way regular expressions work mean that the memory requirements and performance for a simple a|b|c.....|x|y|z expression as described are not too bad, even for a very large number of variants.

However, if your expression is even slightly more complex than that, you could cause the expression to lose performance exponentially, as well as massively growing its memory footprint, as an large number of or options like this can cause it to have to do massive amounts of backtracking if other parts of the expression don't match immediately.

You may therefore want to excersise caution either doing this sort of thing. Even if it works now, it would only take a small and relatively innocent change to make the whole thing come to a grinding halt.

like image 30
Spudley Avatar answered Sep 24 '22 14:09

Spudley