In answer to the following question: How to convert MatchCollection to string array
Given The two Linq expressions:
var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b") .OfType<Match>() //OfType .Select(m => m.Groups[0].Value) .ToArray();
and
var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b") .Cast<Match>() //Cast .Select(m => m.Groups[0].Value) .ToArray();
OfType<> was benchmarked by user Alex to be slightly faster (and confirmed by myself).
This seems counterintuitive to me, as I'd have thought OfType<> would have to do both an 'is' comparison, and a cast (T).
Any enlightenment would be appreciated as to why this is the case :)
The OfType<TResult>(IEnumerable) method returns only those elements in source that can be cast to type TResult . To instead receive an exception if an element cannot be cast to type TResult , use Cast<TResult>(IEnumerable).
The OfType is a filter operation and it filters the collection based on the ability to cast an element in a collection to a specified type. It searches elements by their type only. Syntax.
LINQ Cast() Method In LINQ, Cast operator is used to cast/convert all the elements present in a collection into a specified data type of new collection. In case if we try to cast/convert different types of elements (string/integer) in the collection, then the conversion will fail, and it will throw an exception.
My benchmarking does not agree with your benchmarking.
I ran an identical benchmark to Alex's and got the opposite result. I then tweaked the benchmark somewhat and again observed Cast
being faster than OfType
.
There's not much in it, but I believe that Cast
does have the edge, as it should because its iterator is simpler. (No is
check.)
Edit: Actually after some further tweaking I managed to get Cast
to be 50x faster than OfType
.
Below is the code of the benchmark that gives the biggest discrepancy I've found so far:
Stopwatch sw1 = new Stopwatch(); Stopwatch sw2 = new Stopwatch(); var ma = Enumerable.Range(1, 100000).Select(i => i.ToString()).ToArray(); var x = ma.OfType<string>().ToArray(); var y = ma.Cast<string>().ToArray(); for (int i = 0; i < 1000; i++) { if (i%2 == 0) { sw1.Start(); var arr = ma.OfType<string>().ToArray(); sw1.Stop(); sw2.Start(); var arr2 = ma.Cast<string>().ToArray(); sw2.Stop(); } else { sw2.Start(); var arr2 = ma.Cast<string>().ToArray(); sw2.Stop(); sw1.Start(); var arr = ma.OfType<string>().ToArray(); sw1.Stop(); } } Console.WriteLine("OfType: " + sw1.ElapsedMilliseconds.ToString()); Console.WriteLine("Cast: " + sw2.ElapsedMilliseconds.ToString()); Console.ReadLine();
Tweaks I've made:
On my machine this results in ~350ms for Cast
and ~18000ms for OfType
.
I think the biggest difference is that we're no longer timing how long MatchCollection
takes to find the next match. (Or, in my code, how long int.ToString()
takes.) This drastically reduces the signal-to-noise ratio.
Edit: As sixlettervariables pointed out, the reason for this massive difference is that Cast
will short-circuit and not bother casting individual items if it can cast the whole IEnumerable
. When I switched from using Regex.Matches
to an array to avoid measuring the regex processing time, I also switched to using something castable to IEnumerable<string>
and thus activated this short-circuiting. When I altered my benchmark to disable this short-circuiting, I get a slight advantage to Cast
rather than a massive one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With