I'm looking over Joe Albahari's C# 5.0 in A Nutshell and in Chapter 26 regarding regular expressions, he states:
In some of the proceeding examples, we called a static
RegEx
method repeatedly with the same pattern. An alternative approach in these cases is to instantiate aRegex
object with the pattern and then call instance methods . . .
// Code example from the book
Regex r = new Regex (@"sausages?");
Console.WriteLine (r.Match ("sausage")); // sausage
Console.WriteLine (r.Match ("sausages")); // sausages
This is not just a syntactic convenience: under the covers . . . This results in (up to 10 times) faster matching, at the expense of a small initial compilation cost (a few tens of microseconds).
So the curious me wrote a benchmark. This program splits a string, iterating ~ 32 million calls of both the static calls and instance calls of Regex
, as well as an alternative way to perform the same task.
class Program {
static void Main(string[] args) {
var str = "01/02/03/04/05/06/07/08/09/10";
var regex = new Regex("/");
var results = new List<Tuple<string, long>>();
for (int j = 0; j < 128; j++) {
var s = Stopwatch.StartNew();
for (var i = 0; i < 1024 * 1024; i++) {
RegexSplit(str);
}
s.Stop();
results.Add(new Tuple<string, long>("Regex", s.ElapsedTicks));
s = Stopwatch.StartNew();
for (var i = 0; i < 1024 * 1024; i++) {
CompiledRegexSplit(str, regex);
}
s.Stop();
results.Add(new Tuple<string, long>("Compiled", s.ElapsedTicks));
s = Stopwatch.StartNew();
for (var i = 0; i < 1024 * 1024; i++) {
StringSplit(str);
}
s.Stop();
results.Add(new Tuple<string, long>("String", s.ElapsedTicks));
Console.Write(".");
}
var resultsGroup = from it in results
group it by it.Item1
into g
select new {
Type = g.Key,
Avg = g.Average(git => git.Item2)
};
resultsGroup.ToList().ForEach(it => Console.WriteLine("{0}: {1:000000000.00}", it.Type, it.Avg));
}
static void StringSplit(string str) {
var split = str.Split('/');
}
static void CompiledRegexSplit(string str, Regex regex) {
var split = regex.Split(str);
}
static void RegexSplit(string str) {
var split = Regex.Split(str, "/");
}
}
and got the following results:
Regex: 12257601.40
Compiled: 10869996.92
String: 01328636.27
That's not quite what I expected based on the book, and I doubt that instantiating one Regex
takes 12 million ticks.
This run was in .NET 4.5, x64 release mode.
What is the explanation of the unexpected result?
Your code has only produced an instance of a Regex object. To use an actual compiled Regex object, you must specify the RegexOptions.Compiled option. This will inform the Regex object that it will be used in such a way that it is worth the up front cost to compile itself so that it can be executed more quickly.
The reason it is not done automatically is that for a limited number of runs, the process of compiling the regular expression will actually take longer than the time saved. The Regex object exists to hold a regular expression with meta data such as Regex engine options and such, so it may be used with or without compilation.
The code to do the compilation would be:
var regex = new Regex("/", RegexOptions.Compiled);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With