Say I have two vectors of float A and B. I need to find of dot product of A and B, ie. sign(A.B) - if it is positive or negative or 0. Size of vectors is small, less than 100. However,I need to do this extremely fast!
You can assume all elements in A are floats in range [0,1] and that of B are [-500,+500]. I was looking for exact solution, but approximate solutions will do too if practically do not give a lot of wrong answers (I know, 'lot of' is subjective but I can't put an exact number on it without talking about hardware or implementation)
I explored Pragma compiler directives using -O4 worked the fastest. I explored some more improvements in implementation to make it parallizable based on autovectorization support of underlying processor. Like for avx instruction set, keep 8 independent variable and find dot product, so that all 8 of register capacity is utilized. But I think we can still be much more faster! Basic idea is, we only need to determine sign of dot product, so there is a lot of scope to tradeoff precision for speed. So i am trying to come up with some mathematical or algorithmic solution to make this tradeoff happen. One Idea i have is to use FFT (Fast fourier transform) to reduce number of multiplications. Another idea I tried to explore was bitwise tricks, but realized bitwise for float is not possible. (IEEE standards are not garunteed when you use fast pragma like Ofast or O3)
You might be thinking why is this such an important to optimize for such minor task but I think it can be a very useful question :-
On modern architectures, the floating point calculation of the dot product already is very fast, 1 cycle will be spent for an addition and 1-2 cycles for the multiplication.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With