Having two int matrices A and B, with more than 1000 rows and 10K columns, I often need to convert them to float matrices to gain speedup (4x or more). I'm wondering why is this the case? I realize that there is a lot of optimization and vectorizations such as AVX, etc going on with float matrix multiplication. But yet, there are instructions such AVX2, for integers (if I'm not mistaken). And, can't one make use of SSE and AVX for integers? Why isn't there a heuristic underneath matrix algebra libraries such as Numpy or Eigen to capture this and perform integer matrix multiplication faster just like float? <blockquote> About accepted answer: While @sascha's answer is very informative and relevant, @chatz's answer is the actual reason why the int by int multiplication is slow irrespective of whether BLAS integer matrix operations exist. </blockquote>

If you compile these two simple functions which essentially just calculate a product (using the Eigen library) <pre class="prettyprint"><code>#include <Eigen/Core> int mult_int(const Eigen::MatrixXi& A, Eigen::MatrixXi& B) { Eigen::MatrixXi C= A*B; return C(0,0); } int mult_float(const Eigen::MatrixXf& A, Eigen::MatrixXf& B) { Eigen::MatrixXf C= A*B; return C(0,0); } </code></pre> using the flags <code>-mavx2 -S -O3</code> you will see very similar assembler code, for the integer and the float version. The main difference however is that <code>vpmulld</code> has 2-3 times the latency and just 1/2 or 1/4 the throughput of <code>vmulps</code>. (On recent Intel architectures) Reference: Intel Intrinsics Guide, "Throughput" means the reciprocal throughput, i.e., how many clock-cycles are used per operation, if no latency happens (somewhat simplified).

Why is it faster to perform float by float matrix multiplication compared to int by int?

Tags:

Having two int matrices A and B, with more than 1000 rows and 10K columns, I often need to convert them to float matrices to gain speedup (4x or more).

I'm wondering why is this the case? I realize that there is a lot of optimization and vectorizations such as AVX, etc going on with float matrix multiplication. But yet, there are instructions such AVX2, for integers (if I'm not mistaken). And, can't one make use of SSE and AVX for integers?

Why isn't there a heuristic underneath matrix algebra libraries such as Numpy or Eigen to capture this and perform integer matrix multiplication faster just like float?

About accepted answer: While @sascha's answer is very informative and relevant, @chatz's answer is the actual reason why the int by int multiplication is slow irrespective of whether BLAS integer matrix operations exist.

468

asked Jul 28 '17 12:07

NULL

1 Answers

If you compile these two simple functions which essentially just calculate a product (using the Eigen library)

#include <Eigen/Core>  int mult_int(const Eigen::MatrixXi& A, Eigen::MatrixXi& B) {     Eigen::MatrixXi C= A*B;     return C(0,0); }  int mult_float(const Eigen::MatrixXf& A, Eigen::MatrixXf& B) {     Eigen::MatrixXf C= A*B;     return C(0,0); }

using the flags -mavx2 -S -O3 you will see very similar assembler code, for the integer and the float version. The main difference however is that vpmulld has 2-3 times the latency and just 1/2 or 1/4 the throughput of vmulps. (On recent Intel architectures)

Reference: Intel Intrinsics Guide, "Throughput" means the reciprocal throughput, i.e., how many clock-cycles are used per operation, if no latency happens (somewhat simplified).

answered Oct 21 '22 04:10

chtz

Related questions
                            
                                Can Firebase Cloud Storage rules validate against Firestore data?
                            
                                Is there an XML schema extension for Visual Studio Code?
                            
                                Can't bind to 'icon' since it isn't a known property of 'fa-icon'
                            
                                How to uninstall Elm package?
                            
                                Angular 6 building a library with assets
                            
                                Why examples don't work? (a struggle with imports)
                            
                                What is difference between release notes and changelog?
                            
                                convert io.StringIO to io.BytesIO
                            
                                Variable scope and name resolution in Python
                            
                                Replace wildcards in a binary string avoiding three identical consecutive letters
                            
                                OpenID as a Single Sign On option? [closed]
                            
                                Setting up Team foundation server [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With