This is a continuation of questions like this one.
Are there any guidelines for tweaking the performance? I don't mean gains in big-O, just saving some linear time.
For example, how much does pre-sorting save on either SortedList
or SortedDictionary
?
Say I have a person-class with 3 properties to sort on, one of them is age in years. Should I bucket the objects on age first?
Should I first sort on one property, then use the resulting list/dictionary to sort on two properties and so on?
Any other optimizations that spring to mind?
SortedDictionary is implemented with Binary Search Tree, while SortedList is implemented with two internal arrays for keys and values, respectively. SortedList is more memory-efficient than SortedDictionary, and SortedList is faster than SortedDictionary when it needs to go through all items at once.
A sorted list is a combination of an array and a hash table. It contains a list of items that can be accessed using a key or an index. If you access items using an index, it is an ArrayList, and if you access items using a key, it is a Hashtable. The collection of items is always sorted by the key value.
A SortedList does not allow duplicate keys. Operations on a SortedList object tend to be slower than operations on a Hashtable object because of the sorting. Elements in this collection can be accessed using an integer index.
In C#, SortedSet is a collection of objects in sorted order. It is of the generic type collection and defined under System. Collections. Generic namespace. It also provides many mathematical set operations, such as intersection, union, and difference.
Well, it's an easy win on SortedList. Inserting an item requires a binary search (O(log(n)) to find the insertion point, then a List.Insert (O(n)) to insert the item. The Insert() dominates, populating the list requires O(n^2). If the input items are already sorted then the Insert collapses to O(1) but doesn't affect the search. Populating is now O(nlog(n)). You don't worry how big the Oh is, sorting first is always more efficient. Assuming you can afford the doubled storage requirement.
SortedDictionary is different, it uses a red-black tree. Finding the insertion point requires O(log(n)). Rebalancing the tree might be required afterwards, that also takes O(log(n)). Populating the dictionary thus takes O(nlog(n)). Using sorted input does not change the effort to find the insertion point or rebalancing, it is still O(nlog(n)). Now the Oh matters though, inserting sorted input requires the tree to constant rebalance itself. It works better if the input is random, you don't want sorted input.
So populating SortedList with sorted input and populating SortedDictionary with unsorted input is both O(nlog(n)). Ignoring the cost of providing sorted input, the Oh of SortedList is smaller than the Oh of SortedDictionary. That's an implementation detail due to the way List allocates memory. It only has to do so O(log(n)) times, a red-black tree has to allocate O(n) times. Very small Oh btw.
Notable is that neither one compares favorably over simply populating a List, then calling Sort(). That's also O(nlog(n)). In fact, if input is already accidentally sorted you can bypass the Sort() call, this collapses to O(n). The cost analysis now needs to move to the effort it takes to get the input sorted. It is hard to bypass the fundamental complexity of Sort(), O(nlog(n)). It might not be readily visible, you might get the input sorted by, say, a SQL query. It will just take longer to complete.
The point of using either SortedList or SortedDictonary is to keep the collection sorted after inserts. If you only worry about populating but not mutating then you shouldn't use those collections.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With