I am trying to find an efficient way to sort an array of strings based on a numeric value within each string element of the array. I am currently using the Array.Sort(array, customComparer) static method (quick sort), with my custom comparer class (sorting in descending order) being:
class StringComparer : IComparer<string>
{
public int Compare(string a, string b)
{
string s1 = a;
string s2 = b;
Match matchA = Regex.Match(s1, @"\d+$");
Match matchB = Regex.Match(s2, @"\d+$");
long numberA = long.Parse(matchA.Value);
long numberB = long.Parse(matchB.Value);
if (numberB - numberA < 0)
{
return -1;
}
else
{
return 1;
}
}
}
This works very well, but sometimes it takes too much time to sort, with an array of 100 000 strings taking more than a minute on a 2.4Ghz processor. I wonder if there is a more efficient way to accomplish the same. For example, implementing a different sorting algorithm or taking another approach like using a dictionary and sorting on the value (the value being the numeric part of the string). Any suggestions? Thanks in advance!
You're parsing the value for each comparison. I would suggest you parse once, to get a string/long pair, sort that, and then extract the string part afterwards.
Note that your existing code has a bug: it will never return 0, for two strings comparing as equal.
Here's an alternative approach using LINQ (which isn't in-place sorting, but is simple.)
var sorted = unsorted.OrderBy(x => long.Parse(Regex.Match(x, @"\d+$").Value));
.ToList();
(OrderBy projects once to get the keys, then compares keys.)
You are now performing the Regexes O(n log n) times.
Consider looping once over all strings, extracting the numerical value and adding it to a SortedDictionary<long, string>
This requires only O(n) executions of the Reg expression. The rest of the sorting should be comparable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With