Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Abysmal performance with Dictionary that has a KeyValuePair as key (C# .NET)

In an app that I'm writing I have two potentially large sets of data I need to map against each other. One is a List returned from a web service and one is a DataTable. I need to take the ANSI (or ISO) number for each item in the list and find the row of the DataTable containing that ANSI number and then do stuff with it.

Since DataTable.Select is pretty slow and I would have to do that for each item in the List, I experimented with faster alternatives. Keep in mind that there is no database for the DataTable object. So I can't leverage any SQL capabilities or anything like that.

I thought the fastest way might be to create a dictionary with a KeyValuePair (A:Ansi number or I:Iso number) and use that as a key. The value would be the rest of the row. Creating that dictionary would obviously take a little processing time, but then I could leverage the extremely fast search times of the dictionary to find each row I need and then add the rows back to the table afterwards. So within the foreach loop going for the list I would only have a complexity of O(1) with the dictionary instead of O(n) or whatever DataTable.Select has.

To my surprise it turned out the dictionary was incredibly slow. I couldn't figure out why until I found out that using a string (just ANSI number) instead of a KeyValuePair increased the performance dramatically. I'm talking hundreds of times faster. How on Earth is that possible? Here is how I test:

I generate a List that simulates the output from the web service. I create a dictionary based on that list with a key (either string or KeyValuePair) and the DataRow as value. I go through a foreach loop for that list and search each item in that list in my dictionary and then assign a value to the DataRow that is returned. That's it.

If I use KeyValuePair as a key to access the dictionary it takes seconds for 1,000 items, if I modify the dictionary to take only a string as a key it takes milliseconds for 10,000 items. FYI: I designed the test so that there would always be hits, so all keys are always found.

Here is the block of code for which I'm measuring the time:

foreach(ProductList.Products item in pList.Output.Products)
{
   //KeyValuePair<string, string> kv = new KeyValuePair<string, string>("A", item.Ansi);
   DataRow row = dict[item.Ansi];
   for (int i = 0; i < 10; i++)
   {
      row["Material"] = item.Material + "a"; //Do stuff just for debugging
   }
   hits++;
}

So how on Earth is it possible that the execution time suddenly becomes hundreds of times longer if I use a Dictionary(KeyValuePair,DataRow) instead of Dictionary(String,DataRow)?

like image 948
user2696330 Avatar asked Nov 13 '15 15:11

user2696330


People also ask

What is KeyValuePair C#?

The KeyValuePair class stores a pair of values in a single list with C#. Set KeyValuePair and add elements − var myList = new List<KeyValuePair<string, int>>(); // adding elements myList.

Can a dictionary have 2 keys in C#?

It's a dictionary of dictionaries, so you have 2 keys to access each object, the key for the main dictionary to get you the required sub dictionary, and then the second key for the sub dictionary to get you the required item.

Which is faster tuple or dictionary in c#?

The Tuple method is similar to the above snippets, and while it is faster than the Dictionary<int, KeyValuePair<string, string>> , it is still nowhere near as fast as indexing directly into the collection to the desired value based on a hashed key, as is done in the MultiKeyDictionary class.

Is KeyValuePair immutable?

KeyValuePair is immutable - it's also a value type, so changing the value of the Value property after creating a copy wouldn't help anyway.


2 Answers

KeyValuePair<TKey, TValue> doesn't implement the GetHashCode() method. This means that the only way to meaningfuly organize the dictionary is gone, and you're left with an inefficient linear search.

This shouldn't be surprising, since it's not what KeyValuePair<TKey, TValue> is designed for - it's an internal structure used by the dictionary, not a key. There's no requirement for .NET objects to be useful keys, and returning 0 from all GetHashCode() calls is perfectly valid.

If you don't want to use your own structures, use Tuple. But I would really just create my own structure for any kind of persistence, really.

As a side-note, DataTable.Select is actually pretty fast for what it's designed for - filtering data for output. It's not really designed for being called hundreds of times in a loop, though - the overhead dominates. This assumes that you have proper indices, of course. In your case, I think the indices are regenerated every time you call Select, which is a bit slow :)

like image 115
Luaan Avatar answered Nov 15 '22 02:11

Luaan


You are probably getting a high number of hash collisions with key value pair. You can test with GetHashCode.

The link below is tuple but I highly suspect you have the same thing going on with key value pair. gethashcode-high-rate-of-duplicates I would mark as a duplicate but you many have something else going on.

In this link Microsoft recommends not using value types for key. GetHashCode for KVP is inherited from value type.

like image 43
paparazzo Avatar answered Nov 15 '22 03:11

paparazzo