Given the following from MSDN:
Regex objects can be created on any thread and shared between threads.
I have found that for performance, it is better NOT to share a Regex
instance between threads when using the ThreadLocal
class.
Please could someone explain why it runs approximately 5 times faster for a thread local instance?
Here are the results (on an 8 core machine):
Using Regex singleton' returns 3000000 and takes 00:00:01.1005695
Using thread local Regex' returns 3000000 and takes 00:00:00.2243880
Source Code:
using System;
using System.Linq;
using System.Threading;
using System.Text.RegularExpressions;
using System.Diagnostics;
namespace ConsoleApplication1
{
class Program
{
static readonly string str = new string('a', 400);
static readonly Regex re = new Regex("(a{200})(a{200})", RegexOptions.Compiled);
static void Test(Func<Regex> regexGettingMethod, string methodDesciption)
{
Stopwatch sw = new Stopwatch();
sw.Start();
var sum = Enumerable.Repeat(str, 1000000).AsParallel().Select(s => regexGettingMethod().Match(s).Groups.Count).Sum();
sw.Stop();
Console.WriteLine("'{0}' returns {1} and takes {2}", methodDesciption, sum, sw.Elapsed);
}
static void Main(string[] args)
{
Test(() => re, "Using Regex singleton");
var threadLocalRe = new ThreadLocal<Regex>(() => new Regex(re.ToString(), RegexOptions.Compiled));
Test(() => threadLocalRe.Value, "Using thread local Regex");
Console.Write("Press any key");
Console.ReadKey();
}
}
}
Positing my investigation results.
Let's ILSpy Regex. It contains a reference to RegexRunner. When Regex object is matching something it locks its RegexRunner. If there is another concurrent request to the same Regex object another temporary instance of RegexRunner gets created. RegexRunner is expensive. More threads are sharing Regex object the more chance to waste time creating temporary RegexRunners. Hope Microsoft will fix that addressing the era of massive parallelism.
Another thing: static members of Regex class taking pattern string as a parameter (like Match.IsMatch(input, pattern)) also must perform badly when the same pattern is being matched in different threads. Regex maintains a cache of RegexRunners. Two concurrent Match.IsMatch() with the same pattern will try to use the same RegexRunner and one thread will have to create temporary RegexRunner.
Thanks Will for letting me know how you handle here questions that topic-starter have found an answer for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With