Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing sparse dot-product in C#

I'm trying to calculate the dot-product of two very sparse associative arrays. The arrays contain an ID and a value, so the calculation should be done only on those IDs that are common to both arrays, e.g.

[(1, 0.5), (3, 0.7), (12, 1.3)] * [(2, 0.4), (3, 2.3), (12, 4.7)] = (0.7 * 2.3) + (1.3 * 4.7)

My implementation (call it dict) currently uses Dictionaries, but it is too slow to my taste.

double dot_product(IDictionary<int, double> arr1, IDictionary<int, double> arr2)
  {
     double res = 0;
     double val2;
     foreach (KeyValuePair<int, double> p in arr1)
        if (arr2.TryGetValue(p.Key, out val2))
           res += p.Value * val2;
     return res;
  }

The full arrays have about 500,000 entries each, while the sparse ones are only tens to hundreds entries each.

I did some experiments with toy versions of dot products. First I tried to multiply just two double arrays to see the ultimate speed I can get (let's call this "flat").

Then I tried to change the implementation of the associative array multiplication using an int[] ID array and a double[] values array, walking together on both ID arrays and multiplying when they are equal (let's call this "double").

I then tried to run all three versions with debug or release, with F5 or Ctrl-F5. The results are as follows:

debug F5:    dict: 5.29s double: 4.18s (79% of dict) flat: 0.99s (19% of dict, 24% of double)
debug ^F5:   dict: 5.23s double: 4.19s (80% of dict) flat: 0.98s (19% of dict, 23% of double)
release F5:  dict: 5.29s double: 3.08s (58% of dict) flat: 0.81s (15% of dict, 26% of double)
release ^F5: dict: 4.62s double: 1.22s (26% of dict) flat: 0.29s ( 6% of dict, 24% of double)

I don't understand these results.
Why isn't the dictionary version optimized in release F5 as do the double and flat versions?
Why is it only slightly optimized in the release ^F5 version while the other two are heavily optimized?

Also, since converting my code into the "double" scheme would mean lots of work - do you have any suggestions how to optimize the dictionary one?

Thanks!
Haggai

like image 499
Haggai Avatar asked Oct 15 '22 06:10

Haggai


1 Answers

I recommend using SortedList<int, double> instead of the Dictionary. Instead of running TryGetValue repeatedly, you can now create two separate Enumerators and walk each list in parallel. Always move forward with whichever list is 'behind' in enumeration and any time you see two enumerated elements equal, you've found a match. Don't have my IDE handy at the moment, but pseudo-code is like this:

Get enumerator for vector A
Get enumerator for vector B
while neither enumerator is at the end
   if index(A) == index(B) then
     this element is included in dot product
     move forward in A and B
   else if index(A) < index(B) then
     move forward in A
   else # index(A) > index(B)
     move forward in B
continue while loop
like image 164
Dan Bryant Avatar answered Nov 15 '22 10:11

Dan Bryant