The following C# code seems to run slower when built with VS2010 than with VS2008: on a Core i5 Win7 x64 8 GB RAM PC, the VS2008 built version sorts strings in about 7.5 seconds, instead the VS2010 built version requires about 9 seconds. Why is that?
Is there anything wrong with my code?
Did the sorting algorithm change in VS2010?
Is there anything different in the underlying CLR that makes the performance worse?
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Globalization;
using System.Linq;
namespace StringSortCSharp
{
/// <summary>
/// Console app to test string sorting performance in C#.
/// </summary>
class Program
{
/// <summary>
/// Displays the first lines from a vector of strings.
/// </summary>
/// <param name="wishedN">Number of lines to display.</param>
/// <param name="lines">Source lines to display.</param>
private static void DisplayFirst(int wishedN, List<string> lines)
{
int n = Math.Min(wishedN, lines.Count);
for (int i = 0; i < n; i++)
{
Console.WriteLine(" " + lines[i]);
}
Console.WriteLine();
}
/// <summary>
/// Used for random permutation.
/// </summary>
private static Random random = new Random();
/// <summary>
/// Computes a random permutation of the input sequence.
///
/// From:
/// http://stackoverflow.com/questions/375351/most-efficient-way-to-randomly-sort-shuffle-a-list-of-integers-in-c-sharp
///
/// </summary>
/// <typeparam name="T">Type stored in the sequences.</typeparam>
/// <param name="sequence">Input sequence.</param>
/// <returns>Random permutation of the input sequence.</returns>
private static IEnumerable<T> RandomPermutation<T>(IEnumerable<T> sequence)
{
T[] retArray = sequence.ToArray();
for (int i = 0; i < retArray.Length - 1; i += 1)
{
int swapIndex = random.Next(i + 1, retArray.Length);
T temp = retArray[i];
retArray[i] = retArray[swapIndex];
retArray[swapIndex] = temp;
}
return retArray;
}
/// <summary>
/// Builds a list of strings used in the performance benchmark.
/// </summary>
/// <returns>Test list of strings.</returns>
private static List<string> BuildTestLines()
{
// Start with "Lorem ipsum", and repeat it several times, adding some suffix strings.
var lorem = new string[]
{
"Lorem ipsum dolor sit amet, consectetuer adipiscing elit.",
"Maecenas porttitor congue massa. Fusce posuere, magna sed",
"pulvinar ultricies, purus lectus malesuada libero,",
"sit amet commodo magna eros quis urna.",
"Nunc viverra imperdiet enim. Fusce est. Vivamus a tellus.",
"Pellentesque habitant morbi tristique senectus et netus et",
"malesuada fames ac turpis egestas. Proin pharetra nonummy pede.",
"Mauris et orci."
};
int repeatCount = 200 * 1000;
Console.Write("Building test strings");
var testLines = new List<string>();
Console.Write(" (total string count = {0})", repeatCount * lorem.Length);
Console.Write("...");
for (int i = 0; i < repeatCount; i++)
{
for (int j = 0; j < lorem.Length; j++)
{
// Add more stuff to Lorem strings
testLines.Add(lorem[j] + " (#" + i + ")");
}
}
Console.WriteLine("done.");
DisplayFirst(5, testLines);
Console.WriteLine();
// Shuffle the previously built strings.
Console.Write("Shuffling strings...");
var randomLines = new List<string>(RandomPermutation(testLines));
Console.WriteLine("done.");
DisplayFirst(5, randomLines);
Console.WriteLine();
return randomLines;
}
/// <summary>
/// Sort the input lines.
/// </summary>
/// <param name="lines">Input lines to sort.</param>
private static void Test(List<string> lines)
{
// Stopwatch to measure time performance
var timer = new Stopwatch();
Console.Write("Sorting " + lines.Count + " lines...");
// Sort benchmark
timer.Start();
lines.Sort();
timer.Stop();
Console.WriteLine("done.");
// Display results
DisplayFirst(5, lines);
Console.WriteLine();
Console.WriteLine((timer.ElapsedMilliseconds / 1000.0).ToString(CultureInfo.InvariantCulture) + " seconds elapsed.");
}
static void Main(string[] args)
{
Console.WriteLine("*** Testing String Sorting in C# ***");
Console.WriteLine();
// Build test lines used for the sort benchmark
List<string> testLines = BuildTestLines();
// Run the sort test
Test(testLines);
}
}
}
Here is a brief outline of sorting algorithms used in .NET versions. It's helpful to remember that List<T>.Sort()
internally uses Array<T>.Sort()
Array
. There have been minor changes to the code, but for the most part, the code remains the same.Yes, but the changes were minor, and doesn't affect performance. Consider a sort against 20 million shuffled integers1:
List<int>.Sort() (20 million) .NET 3.5 .NET 4.0 .NET 4.5 --------- --------- --------- 2.564s 2.565s 2.337s
There's no change between v3.5 and v4.0 in terms of performance. There is a noticeable increase in speed for v4.5. It's clear that it's not the actual sorting algorithm that is making the difference.
Before we jump into your next question, let me share my results of running your actual code on my machine:
List<string>.Sort() (1.6 million) .NET 3.5 .NET 4.0 .NET 4.5 --------- --------- --------- 7.953s 11.267s 10.092s
I get similar results, as you do. These results are a good lead-in to your next question:
Without a doubt. So, what is the difference? The difference is in string comparison implementation. In each step of the sorting algorithm it needs to compare the two strings, and it happens to do it differently between the v2.0 and v4.0 runtime. (See extra notes below)
The easiest way to prove this is to force sorting by ordinal position, instead of culture dependence. Replace lines.Sort();
with lines.Sort(StringComparer.Ordinal);
. Here is what I measured:
List<string>.Sort(StringComparer.Ordinal) (1.6 million) .NET 3.5 .NET 4.0 .NET 4.5 --------- --------- --------- 4.088s 3.76s 3.454s
Now, that looks better! It's more or less what I expected; a steady increase in speed for each version of the framework released. MSDN Suggests that if you're ever doing a non-linguistic comparison on a string, you should use an ordinal comparison.
However, that only solves the problem if your comparison or sorting isn't culture-sensitive. If you need culture-sensitive sorting, it seems you won't be able to get rid of the slower execution time unless you want to revert to the .NET 3.5 framework.
When you don't pass a comparer to List<T>.Sort()
or Array.Sort
, it will use the default comparer. Default comparers for .NET strings uses the comparer from the Thread's current culture. From there, it calls some internal functions in the .NET runtime native libraries.
In v2.0-3.5, it calls COMNlsInfo::Compare
and COMNlsInfo::CompareFast
. Here's what the call stack (kinda) looks like:
String.CompareTo(string) +--System.Globalization.CompareInfo.Compare(string,string,CompareOptions) +--mscorwks.dll!COMNlsInfo::Compare +--mscorwks.dll!COMNlsInfo::CompareFast
Similar source for these functions is visible in the shared source implementation of the CLI (SSCLI). It's located in sscli\clr\src\classlibnative\nls\comnlsinfo.cpp
on lines 1034 and 893, respectively.
In v4.0, however, that call tree changed fairly significantly:
String.CompareTo(string) +--System.Globalization.CompareInfo.Compare(string,string,CompareOptions) +--clr.dll!COMNlsInfo::InternalCompareString +--clr.dll!SortVersioning::SortDllCompareString +--nlssorting.dll!_SortCompareString +--nlssorting.dll!_AsciiCompareString
I wish I could tell you why one is slower than the other, but I have no clue at all and there is no SSCLI for .NET 4.0 to compare against. The major changes to string handling in .NET 4.0 weren't without problems. There have been performance issues related to strings in .NET 4.0, however they don't really apply here.
1All tests were run in a virtual machine. Win 2008R2 x64 w/ 4GB RAM and a virtual quad-core processor. Host machine is Win7 x64 w/ 24GB RAM and a Xeon W3540 (2.93ghz) quad-core (8 Logical processors). Results are an average of 5 runs with the best and worst times removed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With