Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is it faster to perform float by float matrix multiplication compared to int by int?

Tags:

Having two int matrices A and B, with more than 1000 rows and 10K columns, I often need to convert them to float matrices to gain speedup (4x or more).

I'm wondering why is this the case? I realize that there is a lot of optimization and vectorizations such as AVX, etc going on with float matrix multiplication. But yet, there are instructions such AVX2, for integers (if I'm not mistaken). And, can't one make use of SSE and AVX for integers?

Why isn't there a heuristic underneath matrix algebra libraries such as Numpy or Eigen to capture this and perform integer matrix multiplication faster just like float?

About accepted answer: While @sascha's answer is very informative and relevant, @chatz's answer is the actual reason why the int by int multiplication is slow irrespective of whether BLAS integer matrix operations exist.

like image 468
NULL Avatar asked Jul 28 '17 12:07

NULL


People also ask

Why is floating point division faster than integer division?

Floating point number division is faster than integer division because of the exponent part in floating point number representation. To divide one exponent by another one plain subtraction is used.

Is float multiplication faster than integer division?

Often floating point multiply is faster than integer multiply (because floating point multiply is used more often so the CPU designers spend more effort optimising that path). Floating point divide may well be slow (often more than 10 cycles) , but then so is integer divide.

Are floating point operations faster than integer?

Floating-point operations are always slower than integer ops at same data size. Smaller is faster. 64 bits integer precision is really slow. Float 32 bits is faster than 64 bits on sums, but not really on products and divisions.

Why is Matlab so fast in matrix multiplication?

Because MATLAB is a programming language at first developed for numerical linear algebra (matrix manipulations), which has libraries especially developed for matrix multiplications.


1 Answers

If you compile these two simple functions which essentially just calculate a product (using the Eigen library)

#include <Eigen/Core>  int mult_int(const Eigen::MatrixXi& A, Eigen::MatrixXi& B) {     Eigen::MatrixXi C= A*B;     return C(0,0); }  int mult_float(const Eigen::MatrixXf& A, Eigen::MatrixXf& B) {     Eigen::MatrixXf C= A*B;     return C(0,0); } 

using the flags -mavx2 -S -O3 you will see very similar assembler code, for the integer and the float version. The main difference however is that vpmulld has 2-3 times the latency and just 1/2 or 1/4 the throughput of vmulps. (On recent Intel architectures)

Reference: Intel Intrinsics Guide, "Throughput" means the reciprocal throughput, i.e., how many clock-cycles are used per operation, if no latency happens (somewhat simplified).

like image 90
chtz Avatar answered Oct 21 '22 04:10

chtz