Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does there exist a way to do small fixed range multiplication efficiently?

Suppose I have an inline function:

inline int mul(short x, short y) {
    return (int)x * (int)y;
}

Here y is in {1,2,...,32}, and x is in {-4,-3,-2,-1,0,1,...,8192}. Considering y is in a very small range, does there exist a way to speed up mul()?

Background: this code is extracted from a scientific computing program written in C/C++, and profiling has shown that the above function consumes over 10% CPU time of the whole program since it is called very frequently. Therefore, I would like to try to figure out a way to speed it up.

Thank you :)

like image 511
ACcreator Avatar asked Feb 11 '26 02:02

ACcreator


1 Answers

Intel's SSE4 intrinsics provide the data type __m128i, which can hold 4 32-bit values.

__m128i _mm_mullo_epi32(__m128i a, __m128i b)

Packed integer 32-bit multiplication with truncation of upper halves of results.

Reference

You can perform 4 multiplications at a time. Since you know that your data range is limited, truncation won't be a problem. You could also use single-precision floating point and the older mulps intrinsic.

Besides, it might be a good idea to analyze your program with a profiler like VTune and see if you are suffering from excessive cache misses, aliasing, or alignment problems.

like image 104
Don Reba Avatar answered Feb 13 '26 15:02

Don Reba



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!