Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex.Replace with compiled option in a cycle

Tags:

c#

regex

Good morning,

Let's say I have the following code, which attemps to remove any whitespace from every string in a given list:

foreach (String StrTmp in SomeList)
    Regex.Replace(StrTmp, @"\p{Z}", "", RegexOptions.Compiled)

Since the documentation of RegexOptions.Compiled says that "This yields faster execution but increases startup time", I would like to know if this increased startup time refers to the whole program's startup time or if it refers to the startup of every Regex.Replace function call inside the cycle, thus making the whole cycle slower.

By the way... Isn't there any Regex.Remove(.,.) command to remove every ocurrence of a given regular expression? Basically this is the same as above, but could be shorter and more elegant.

Thank you very much.

like image 366
Miguel Avatar asked Nov 22 '10 09:11

Miguel


People also ask

Can I use regex in replace?

How to use RegEx with . replace in JavaScript. To use RegEx, the first argument of replace will be replaced with regex syntax, for example /regex/ . This syntax serves as a pattern where any parts of the string that match it will be replaced with the new substring.

How is regex compiled?

compile() method is used to compile a regular expression pattern provided as a string into a regex pattern object ( re. Pattern ). Later we can use this pattern object to search for a match inside different target strings using regex methods such as a re. match() or re.search() .

Is compiled regex faster?

I created a much simpler test that will show you that compiled regular expressions are unquestionably faster than not compiled. Here, the compiled regular expression is 35% faster than the not compiled regular expression.

How do you replace a section of a string in regex?

The \[[^\]]*]\[ matches [ , then any 0+ chars other than ] and then ][ . The (...) forms a capturing group #1, it will remember the value that you will be able to get into the replacement with $1 backreference. [^\]]* matches 0+ chars other than ] and this will be replaced.


3 Answers

It refers to the regex compile time. But the Compiled option is designed for regexes that are created once and used often, so it makes most sense to make it once outside the loop and reuse it.

Regex theRegex = new Regex(@"\p{Z}", RegexOptions.Compiled);
foreach (String StrTmp in SomeList)
  string replacementString = theRegex.Replace(StrTmp, "");
like image 181
Michael Low Avatar answered Oct 02 '22 17:10

Michael Low


Reffering to MSDN :

In the .NET Framework versions 1.0 and 1.1, all compiled regular expressions, whether they were used in instance or static method calls, were cached. Starting with the .NET Framework 2.0, only regular expressions used in static method calls are cached.

IMHO you should even make it private member of your class and create it only once in object lifecycle or use static call (Regex.<something>) so that it is cached. In the second approach you should note that MSDN says this:

When you use static method calls with a large number of regular expressions. By default, the regular expression engine caches the 15 most recently used static regular expressions. If your application uses more than 15 static regular expressions, some regular expressions must be recompiled. To prevent this recompilation, you can increase the Regex.CacheSize property to an appropriate value.

So if you optimize speed use approach with regex instance in object (or even class) and if memory is your concern use static method call.

like image 33
Migol Avatar answered Oct 02 '22 18:10

Migol


Regular expressions are not cached. Every time you explicitly create new instance or call Regex.Replace, new instance is created. If flags include RegexOptions.Compiled, it is compiled every time.

Therefore the code you provided will be slow. For optimal performance, if regular expression is used multiple times, it should be created once, then reused.

Regex re = new Regex(@"\p{Z}", RegexOptions.Compiled);
foreach (String StrTmp in SomeList)
    re.Replace(StrTmp, "");
like image 21
Athari Avatar answered Oct 02 '22 19:10

Athari