This is just a question to satisfy my curiosity. But to me it is interesting.
I wrote this little simple benchmark. It calls 3 variants of Regexp execution in a random order a few thousand times:
Basically, I use the same pattern but in different ways.
Your ordinary way without any RegexOptions
. Starting with .NET 2.0 these do not get cached. But should be "cached" because it is held in a pretty global scope and not reset.
With RegexOptions.Compiled
With a call to the static Regex.Match(pattern, input)
which does get cached in .NET 2.0
Here is the code:
static List<string> Strings = new List<string>();
static string pattern = ".*_([0-9]+)\\.([^\\.])$";
static Regex Rex = new Regex(pattern);
static Regex RexCompiled = new Regex(pattern, RegexOptions.Compiled);
static Random Rand = new Random(123);
static Stopwatch S1 = new Stopwatch();
static Stopwatch S2 = new Stopwatch();
static Stopwatch S3 = new Stopwatch();
static void Main()
{
int k = 0;
int c = 0;
int c1 = 0;
int c2 = 0;
int c3 = 0;
for (int i = 0; i < 50; i++)
{
Strings.Add("file_" + Rand.Next().ToString() + ".ext");
}
int m = 10000;
for (int j = 0; j < m; j++)
{
c = Rand.Next(1, 4);
if (c == 1)
{
c1++;
k = 0;
S1.Start();
foreach (var item in Strings)
{
var m1 = Rex.Match(item);
if (m1.Success) { k++; };
}
S1.Stop();
}
else if (c == 2)
{
c2++;
k = 0;
S2.Start();
foreach (var item in Strings)
{
var m2 = RexCompiled.Match(item);
if (m2.Success) { k++; };
}
S2.Stop();
}
else if (c == 3)
{
c3++;
k = 0;
S3.Start();
foreach (var item in Strings)
{
var m3 = Regex.Match(item, pattern);
if (m3.Success) { k++; };
}
S3.Stop();
}
}
Console.WriteLine("c: {0}", c1);
Console.WriteLine("Total milliseconds: " + (S1.Elapsed.TotalMilliseconds).ToString());
Console.WriteLine("Adjusted milliseconds: " + (S1.Elapsed.TotalMilliseconds).ToString());
Console.WriteLine("c: {0}", c2);
Console.WriteLine("Total milliseconds: " + (S2.Elapsed.TotalMilliseconds).ToString());
Console.WriteLine("Adjusted milliseconds: " + (S2.Elapsed.TotalMilliseconds*((float)c2/(float)c1)).ToString());
Console.WriteLine("c: {0}", c3);
Console.WriteLine("Total milliseconds: " + (S3.Elapsed.TotalMilliseconds).ToString());
Console.WriteLine("Adjusted milliseconds: " + (S3.Elapsed.TotalMilliseconds*((float)c3/(float)c1)).ToString());
}
Everytime I call it the result is along the lines of:
Not compiled and not automatically cached: Total milliseconds: 6185,2704 Adjusted milliseconds: 6185,2704 Compiled and not automatically cached: Total milliseconds: 2562,2519 Adjusted milliseconds: 2551,56949184038 Not compiled and automatically cached: Total milliseconds: 2378,823 Adjusted milliseconds: 2336,3187176891
So there you have it. Not much, but about 7-8% difference.
It is not the only mystery. I cannot explain why the first way would be that much slower because it is never re-evaluated but held in a global static variable.
By the way, this is on .Net 3.5 and Mono 2.2 which behave exactly the same. On Windows.
So, any ideas, why the compiled variant would even fall behind?
EDIT1:
After fixing the code the results now look like this:
Not compiled and not automatically cached: Total milliseconds: 6456,5711 Adjusted milliseconds: 6456,5711 Compiled and not automatically cached: Total milliseconds: 2668,9028 Adjusted milliseconds: 2657,77574842168 Not compiled and automatically cached: Total milliseconds: 6637,5472 Adjusted milliseconds: 6518,94897724836
Which pretty much obsoletes all of the other questions as well.
Thanks for the answers.
Compiled Regular Expressions If a Regex object is constructed with the RegexOptions. Compiled option, it compiles the regular expression to explicit MSIL code instead of high-level regular expression internal instructions. This allows .
I created a much simpler test that will show you that compiled regular expressions are unquestionably faster than not compiled. Here, the compiled regular expression is 35% faster than the not compiled regular expression.
Regular Expressions (Regex) Regular Expression, or regex or regexp in short, is extremely and amazingly powerful in searching and manipulating text strings, particularly in processing text files. One line of regex can easily replace several dozen lines of programming codes.
A regex pattern matches a target string. The pattern is composed of a sequence of atoms. An atom is a single point within the regex pattern which it tries to match to the target string. The simplest atom is a literal, but grouping parts of the pattern to match an atom will require using ( ) as metacharacters.
In the Regex.Match version you are looking for the input in the pattern. Try swapping the parameters around.
var m3 = Regex.Match(pattern, item); // Wrong
var m3 = Regex.Match(item, pattern); // Correct
I noticed similar behavior. I also wondered why the compiled version would be slower, but noticed that above a certain number of calls, the compiled version is faster. So I dug into Reflector a little, and I noticed that for a compiled Regex, there's still a little setup that is performed on first call (specifically, creating an instance of the appropriate RegexRunner
object).
In my test, I found that if I moved both the constructor and an initial throw-away call to the regex outside the timer start, the compiled regex won no matter how many iterations I ran.
Incidentally, the caching that the framework is doing when using static Regex
methods is an optimization that's only needed when using static Regex
methods. This is because every call to a static Regex
method creates a new Regex
object. In the Regex
class's constructor it must parse the pattern. The caching allows subsequent calls of static Regex
methods to reuse the RegexTree
parsed from the first call, thereby avoiding the parsing step.
When you use instance methods on a single Regex
object, then this is not an issue. The parsing is still only performed one time (when you create the object). In addition, you get to avoid running all the other code in the constructor, as well as the heap allocation (and subsequent garbage collection).
Martin Brown noticed that you reversed the arguments to your static Regex
call (good catch, Martin). I think you'll find that if you fix that, the instance (not-compiled) regex will beat the static calls every time. You should also find that, given my findings above, the compiled instance will beat the not-compiled one, too.
BUT: You should really read Jeff Atwood's post on compiled regexes before you go blindly applying that option to every regex you create.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With