Does there exist a way to do small fixed range multiplication efficiently?

Question

Suppose I have an inline function:

inline int mul(short x, short y) {
    return (int)x * (int)y;
}

Here y is in {1,2,...,32}, and x is in {-4,-3,-2,-1,0,1,...,8192}. Considering y is in a very small range, does there exist a way to speed up mul()?

Background: this code is extracted from a scientific computing program written in C/C++, and profiling has shown that the above function consumes over 10% CPU time of the whole program since it is called very frequently. Therefore, I would like to try to figure out a way to speed it up.

Thank you :)

Don Reba · Accepted Answer

Intel's SSE4 intrinsics provide the data type __m128i, which can hold 4 32-bit values.

__m128i _mm_mullo_epi32(__m128i a, __m128i b)

Packed integer 32-bit multiplication with truncation of upper halves of results.

Reference

You can perform 4 multiplications at a time. Since you know that your data range is limited, truncation won't be a problem. You could also use single-precision floating point and the older mulps intrinsic.

Besides, it might be a good idea to analyze your program with a profiler like VTune and see if you are suffering from excessive cache misses, aliasing, or alignment problems.

Does there exist a way to do small fixed range multiplication efficiently?

Tags:

c++

c

optimization

ACcreator

1 Answers

Don Reba

Recent Activity

Donate For Us

Does there exist a way to do small fixed range multiplication efficiently?

Tags:

c++

c

optimization

ACcreator

1 Answers

Don Reba

Related questions

Recent Activity

Donate For Us