Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use a SortedList<TKey, TValue> over a SortedDictionary<TKey, TValue>?

This may appear to be a duplicate of this question, which asks "What’s the difference between SortedList and SortedDictionary?" Unfortunately, the answers do nothing more than quote the MSDN documentation (which clearly states that there are performance and memory use differences between the two) but don't actually answer the question.

In fact (and so this question doesn't get the same answers), according to MSDN:

The SortedList<TKey, TValue> generic class is a binary search tree with O(log n) retrieval, where n is the number of elements in the dictionary. In this, it is similar to the SortedDictionary<TKey, TValue> generic class. The two classes have similar object models, and both have O(log n) retrieval. Where the two classes differ is in memory use and speed of insertion and removal:

  • SortedList<TKey, TValue> uses less memory than SortedDictionary<TKey, TValue>.

  • SortedDictionary<TKey, TValue> has faster insertion and removal operations for unsorted data, O(log n) as opposed to O(n) for SortedList<TKey, TValue>.

  • If the list is populated all at once from sorted data, SortedList<TKey, TValue> is faster than SortedDictionary<TKey, TValue>.

So, clearly this would indicated that SortedList<TKey, TValue> is the better choice unless you need faster insert and remove operations for unsorted data.

The question still remains, given the information above what are the practical (real-world, business case, etc.) reasons for using a SortedDictionary<TKey, TValue>? Based on the performance information, it would imply that there really is no need to have SortedDictionary<TKey, TValue> at all.

like image 759
Scott Dorman Avatar asked Sep 04 '09 02:09

Scott Dorman


People also ask

When should you use a SortedDictionary T class rather than a SortedList T class?

SortedDictionary is implemented with Binary Search Tree, while SortedList is implemented with two internal arrays for keys and values, respectively. SortedList is more memory-efficient than SortedDictionary, and SortedList is faster than SortedDictionary when it needs to go through all items at once.

In what way SortedList will add the values?

The SortedList<int, string> will store keys of int type and values of string type. The Add() method is used to add a single key-value pair in a SortedList . Keys cannot be null or duplicate. If found, it will throw a run-time exception.

What is SortedList?

A sorted list is a combination of an array and a hash table. It contains a list of items that can be accessed using a key or an index. If you access items using an index, it is an ArrayList, and if you access items using a key, it is a Hashtable. The collection of items is always sorted by the key value.


2 Answers

I'm not sure how accurate the MSDN documentation is on SortedList and SortedDictionary. It seems to be saying both are implemented using a binary search tree. But if the SortedList uses a binary search tree, why would it be much slower on additions than SortedDictionary?

Anyway, here are some performance test results.

Each test operates on a SortedList / SortedDictionary containing 10,000 int32 keys. Each test is repeated 1,000 times (Release build, Start without Debugging).

The first group of tests add keys in sequence from 0 to 9,999. The second group of tests add random shuffled keys between 0 to 9,999 (every number is added exactly once).

***** Tests.PerformanceTests.SortedTest  SortedDictionary Add sorted: 4411 ms SortedDictionary Get sorted: 2374 ms   SortedList Add sorted: 1422 ms SortedList Get sorted: 1843 ms  ***** Tests.PerformanceTests.UnsortedTest  SortedDictionary Add unsorted: 4640 ms SortedDictionary Get unsorted: 2903 ms   SortedList Add unsorted: 36559 ms SortedList Get unsorted: 2243 ms 

As with any profiling, the important thing is the relative performance, not the actual numbers.

As you can see, on sorted data the sorted list is faster than the SortedDictionary. On unsorted data the SortedList is slightly quicker on retrieval, but about 9 times slower on adding.

If both are using binary trees internally, it is quite surprising that the Add operation on unsorted data is so much slower for SortedList. It is possible that sorted list may also be adding items to a sorted linear data structure at the same time, which would slow it down.

However, you would expect the memory usage of a SortedList to be equal or greater than or at least equal to a SortedDictionary. But this contradicts what the MSDN documentation says.

like image 74
Ash Avatar answered Oct 01 '22 02:10

Ash


I don't know why MSDN says that SortedList<TKey, TValue> use a binary tree for its implementation because if you look at code with a decompiler like Reflector you realize its not true.

SortedList<TKey, TValue> is simply an array that grows over the time.

Every time you insert an element, it first check if the array has enough capacity, if not, a bigger array is recreated and old elements are copied into it (like List<T>)

After that, it searches where to insert the element, using a binary search (this is possible since the array is indexable and already sorted).

To keep the array sorted, it moves (or pushes) all the elements situated after position of element to be inserted by one position (using Array.Copy()).

Eg :

// we want to insert "3"   2   4  <= 3 5 8 9 .       .       .    // we have to move some elements first  2 .  <= 3 4  5  | 8  v 9 . . 

That explains why performance of SortedList is so bad when you insert unsorted elements. It has to re-copy some elements almost every insertion. The only case it has not to be done is when the element has to be inserted at the end of the array.

SortedDictionary<TKey, TValue> is different and use a binary tree to insert and retrieve elements. It also has some cost at insert because sometimes the tree need to be re-balanced (but not every insertion).

Performance is quite similar while searching an element with SortedList or SortedDictionary because they both use a binary search.


In my opinion, you should never use SortedList to just sort an array. Unless you have very few elements, it will always be faster to insert values into a list (or array) and then call Sort() method.

SortedList is mostly useful when you have a list of values already sorted (eg: from database), you want to keep it sorted and perform some operations that would take advantage it is sorted (eg: Contains() method of SortedList performs a binary search instead of linear search)

SortedDictionary offers same advantages than SortedList but performs better if values to insert are not already sorted.


EDIT : If you are using .NET Framework 4.5, an alternative to SortedDictionary<TKey, TValue> is SortedSet<T>. It works the same way as SortedDictionary, using a binary tree, but keys and values are the same here.

like image 22
tigrou Avatar answered Oct 01 '22 04:10

tigrou