Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Benchmark on different String dictionaries shows that the normal one is faster, and I don't know why

I'm doing some benchmark on several string dictionaries, because I wanted to find out how performs the different configurations, and I got surprised that the normal Dictionary<String,String> is the faster. Probably because I'm missing some concept or I'm doing something wrong, but these are the results I got:

Collection:             2147482 items.
Random Keys:            1000 keys.

Normal dictionary
        Add 1000 items:         573ms.
        Get 1000 keys:          0ms.

Normal Dictionary with OrdinalIgnoreCase comparer
        Add 1000 items:         642ms.
        Get 1000 keys:          0ms.

Normal Dictionary with InvariantCultureIgnoreCase comparer
        Add 1000 items:         1661ms.
        Get 1000 keys:          0ms.

Sorted dictionary
        Add 1000 items:         11996ms.
        Get 1000 keys:          5ms.

Sorted dictionary with OrdinalIgnoreCase comparer
        Add 1000 items:         11097ms.
        Get 1000 keys:          5ms.

Sorted dictionary with InvariantCultureIgnoreCase comparer
        Add 1000 items:         9814ms.
        Get 1000 keys:          5ms.

An this is the code I'm using to test it:

    static void Main(string[] args)
    {
        String[] col = GenerateCollectionUnsorted(Int32.MaxValue / 1000).ToArray();
        Int32 len = col.Length;
        Console.WriteLine("Collection:\t\t" + len.ToString() + " items.");

        String[] randomKeys = GetRandomKeys(col, 1000);
        Console.WriteLine("Random Keys:\t\t" + randomKeys.Length.ToString() + " keys.");

        GC.Collect();
        GC.WaitForPendingFinalizers();

        Console.WriteLine();
        Console.WriteLine("Normal dictionary");
        TestDic(col, randomKeys, new Dictionary<String, String>());
        GC.Collect();
        GC.WaitForPendingFinalizers();

        Console.WriteLine();
        Console.WriteLine("Normal Dictionary with OrdinalIgnoreCase comparer");
        TestDic(col, randomKeys, new Dictionary<String, String>(StringComparer.OrdinalIgnoreCase));
        GC.Collect();
        GC.WaitForPendingFinalizers();

        Console.WriteLine();
        Console.WriteLine("Normal Dictionary with InvariantCultureIgnoreCase comparer");
        TestDic(col, randomKeys, new Dictionary<String, String>(StringComparer.InvariantCultureIgnoreCase));
        GC.Collect();
        GC.WaitForPendingFinalizers();

        Console.WriteLine();
        Console.WriteLine("Sorted dictionary");
        TestDic(col,randomKeys, new SortedDictionary<String,String>());
        GC.Collect(); 
        GC.WaitForPendingFinalizers();
        
        Console.WriteLine();
        Console.WriteLine("Sorted dictionary with OrdinalIgnoreCase comparer");
        TestDic(col, randomKeys, new SortedDictionary<String, String>(StringComparer.OrdinalIgnoreCase));
        GC.Collect();
        GC.WaitForPendingFinalizers();

        Console.WriteLine();
        Console.WriteLine("Sorted dictionary with InvariantCultureIgnoreCase comparer");
        TestDic(col, randomKeys, new SortedDictionary<String, String>(StringComparer.InvariantCultureIgnoreCase));
        GC.Collect();
        GC.WaitForPendingFinalizers();

        Console.WriteLine("\nEnd");
        Console.ReadKey(true);
    }

    static void TestDic(String[] col, String[] randomKeys, IDictionary<String,String> dic)
    {
        Stopwatch crono = new Stopwatch();
        
        crono.Start();
        foreach (String s in col)
            dic.Add(s, s);
        crono.Stop();

        Console.WriteLine("\tAdd "+randomKeys.Length.ToString()+" items:\t\t" + crono.ElapsedMilliseconds.ToString() + "ms.");

        crono.Reset();
        crono.Start();
        String sv;
        foreach (var rk in randomKeys)
        {
            sv = dic[rk];
            sv.ToString();
        }
        crono.Stop();
        Console.WriteLine("\tGet " + randomKeys.Length.ToString() + " keys:\t\t" + crono.ElapsedMilliseconds.ToString() + "ms.");
    }

    static String[] GetRandomKeys(String[] col, Int32 count)
    {
        Random ran = new Random(DateTime.Now.Millisecond);

        List<Int32> indexSelection = new List<int>();
        List<String> selection = new List<string>();
        
        Int32 len = col.Length;
        
        while (indexSelection.Count < count)
        {
            Int32 value = ran.Next(0, len - 1);
            if (!indexSelection.Contains(value))
            {
                indexSelection.Add(value);
                selection.Add(col[value]);
            }
        }
        return selection.ToArray();
    }

    static IEnumerable<String> GenerateCollection(Int32 count)
    {
        for (int i = 0; i < count; i++)
        {
            yield return i.ToString("X").PadLeft(5, '0');
        }
    }

    static IEnumerable<String> GenerateCollectionUnsorted(Int32 amount)
    {
        for (int i = 0; i < amount / 2; i++)
        {
            yield return i.ToString("X").PadLeft(5, '0');
            yield return (amount-i).ToString("X").PadLeft(5, '0');
        }
    }

Any idea why is this happening?

EDIT 1: It was my understanding that sorted dictionary would be slower adding items, and faster getting them because the collection is sorted. And also that comparing strings with a OrdinalIgnoreCase or InvariantCultureIgnoreCase should be faster that a normal comparision, so the seek should be faster. But maybe my understanding is completely wrong :D

Thanks in advance.

EDIT 2: As curiosity, I've made some test on String comparisons:

Collection:             2147482 items.
Random Keys:            1000 keys.

CurrentCulture
        LookUp:         158209ms.

CurrentCultureIgnoreCase
        LookUp:         160710ms.

InvariantCulture
        LookUp:         132765ms.

InvariantCultureIgnoreCase
        LookUp:         133409ms.

Ordinal
        LookUp:         36115ms.

OrdinalIgnoreCase
        LookUp:         36329ms.
like image 214
vtortola Avatar asked Dec 17 '22 19:12

vtortola


2 Answers

Because only the normal dictionary with the default comparer is actually a normal dictionary, all the others are sorted dictionaries.

So, the result is very consistent.

Edit:

With that fixed, the result is different. :)

A sorted dictionary is slower when adding items as they need to be sorted, but it's not faster when getting items. Searching the binary tree is fast, but searching the hash list that a regular dictionary uses is faster. When the binary tree grows, there will be more steps to locate each item, while a dictionary mostly grows by adding more buckets so the number of compares is barely affected.

Comparing strings with Ordinal is faster than regular (CurrentCulture) compare, and OrdinalIgnoreCase is faster than CurrentCultureIgnoreCase, but it's not certain that OrdinalIgnoreCase is faster than CurrentCulture. InvariantCulture compare is not faster than a regular compare, it just uses a different culture. The reason that ordinal compare is a lot faster is that it doesn't have to bother with cultural settings at all.

By the way, I noticed an error in the GetRandomKeys method. It will never pick the last item, as you are getting a random number between 0 and Length - 2.

like image 120
Guffa Avatar answered May 11 '23 17:05

Guffa


From the MSDN documentation on SortedDictionary:

The SortedDictionary<TKey, TValue> generic class is a binary search tree with O(log n) retrieval, where n is the number of elements in the dictionary.

From the docs on Dictionary:

Retrieving a value by using its key is very fast, close to O(1), because the Dictionary<TKey, TValue> class is implemented as a hash table.

Therefore it should not be surprising that Dictionary can outperform SortedDictionary in many cases.

like image 27
Daniel Pryden Avatar answered May 11 '23 17:05

Daniel Pryden