Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using static Regex.IsMatch vs creating an instance of Regex

In C# should you have code like:

public static string importantRegex = "magic!";  public void F1(){   //code   if(Regex.IsMatch(importantRegex)){     //codez in here.   }   //more code } public void main(){   F1(); /*   some stuff happens...... */   F1(); } 

or should you persist an instance of a Regex containing the important pattern? What is the cost of using Regex.IsMatch? I imagine there is an NFA created in each Regex intance. From what I understand this NFA creation is non trivial.

like image 954
Ben McNiel Avatar asked Jan 05 '09 20:01

Ben McNiel


People also ask

What does regex IsMatch return?

IsMatch(ReadOnlySpan<Char>, String, RegexOptions, TimeSpan)Indicates whether the specified regular expression finds a match in the specified input span, using the specified matching options and time-out interval.

What is the most accurate description of a regular expression C#?

In C#, Regular Expression is a pattern which is used to parse and check whether the given input text is matching with the given pattern or not. In C#, Regular Expressions are generally termed as C# Regex. The . Net Framework provides a regular expression engine that allows the pattern matching.

What flavor of regex does C# use?

Microsoft . NET, which you can use with any . NET programming language such as C# (C sharp) or Visual Basic.NET, has solid support for regular expressions. . NET's regex flavor is very feature-rich.

How does regex replace work?

The REGEXREPLACE( ) function uses a regular expression to find matching patterns in data, and replaces any matching values with a new string. standardizes spacing in character data by replacing one or more spaces between text characters with a single space.


2 Answers

In a rare departure from my typical egotism, I'm kind of reversing myself on this answer.

My original answer, preserved below, was based on an examination of version 1.1 of the .NET framework. This is pretty shameful, since .NET 2.0 had been out for over three years at the time of my answer, and it contained changes to the Regex class that significantly affect the difference between the static and instance methods.

In .NET 2.0 (and 4.0), the static IsMatch function is defined as follows:

public static bool IsMatch(string input, string pattern){     return new Regex(pattern, RegexOptions.None, true).IsMatch(input); } 

The significant difference here is that little true as the third argument. That corresponds to a parameter named "useCache". When that is true, then the parsed tree is retrieved from cached on the second and subsequent use.

This caching eats up most—but not all—of the performance difference between the static and instance methods. In my tests, the static IsMatch method was still about 20% slower than the instance method, but that only amounted to about a half second increase when run 100 times over a set of 10,000 input strings (for a total of 1 million operations).

This 20% slowdown can still be significant in some scenarios. If you find yourself regexing hundreds of millions of strings, you'll probably want to take every step you can to make it more efficient. But I'd bet that 99% of the time, you're using a particular Regex no more than a handful of times, and the extra millisecond you lose to the static method won't be even close to noticeable.

Props to devgeezer, who pointed this out almost a year ago, although no one seemed to notice.

My old answer follows:


The static IsMatch function is defined as follows:

public static bool IsMatch(string input, string pattern){     return new Regex(pattern).IsMatch(input); } 

And, yes, initialization of a Regex object is not trivial. You should use the static IsMatch (or any of the other static Regex functions) as a quick shortcut only for patterns that you will use only once. If you will reuse the pattern, it's worth it to reuse a Regex object, too.

As to whether or not you should specify RegexOptions.Compiled, as suggested by Jon Skeet, that's another story. The answer there is: it depends. For simple patterns or for patterns used only a handful of times, it may well be faster to use a non-compiled instance. You should definitely profile before deciding. The cost of compiling a regular expression object is quite large indeed, and may not be worth it.


Take, as an example, the following:

const int count = 10000;  string pattern = "^[a-z]+[0-9]+$"; string input   = "abc123";  Stopwatch sw = Stopwatch.StartNew(); for(int i = 0; i < count; i++)     Regex.IsMatch(input, pattern); Console.WriteLine("static took {0} seconds.", sw.Elapsed.TotalSeconds);  sw.Reset(); sw.Start(); Regex rx = new Regex(pattern); for(int i = 0; i < count; i++)     rx.IsMatch(input); Console.WriteLine("instance took {0} seconds.", sw.Elapsed.TotalSeconds);  sw.Reset(); sw.Start(); rx = new Regex(pattern, RegexOptions.Compiled); for(int i = 0; i < count; i++)     rx.IsMatch(input); Console.WriteLine("compiled took {0} seconds.", sw.Elapsed.TotalSeconds); 

At count = 10000, as listed, the second output is fastest. Increase count to 100000, and the compiled version wins.

like image 134
P Daddy Avatar answered Oct 13 '22 17:10

P Daddy


If you're going to reuse the regular expression multiple times, I'd create it with RegexOptions.Compiled and cache it. There's no point in making the framework parse the regex pattern every time you want it.

like image 26
Jon Skeet Avatar answered Oct 13 '22 16:10

Jon Skeet