I have a hand-rolled matrix algorithm which finds the largest number of a right lower square of a square matrix (thus when iterating, some parts are 'jumped' over) - stored as a dense matrix. After an update from vs2010 to vs2017 it seems to be much slower - about a 50% slowdown overall. After some investigation, this was located to the inner loop of a function finding absolute largest value. Looking at the assembler output, this seems be due to some extra mov instructions being inserted within the tight loop. Reworking the loop in different ways seems to solve or partly solve the issue. gcc doesn't seem to have this "issue" in comparison.
Simplified examples (fabs
not always neccessary to reproduce):
#include <cmath>
#include <iostream>
int f_slow(double *A, size_t from, size_t w)
{
double biga_absval = *A;
size_t ir = 0,ic=0;
for ( size_t j = 0; j < w; j++ ) {
size_t n = j*w;
for ( ; n < j*w+w; n++ ) {
if ( fabs(A[n]) <= biga_absval ) {
biga_absval = fabs( A[n] );
ir = j;
ic = n;
}
n++;
}
}
std::cout << ir <<ic;
return 0;
}
int f_fast(double *A, size_t from, size_t w)
{
double* biga = A;
double biga_absval = *biga;
double* n_begin = A + from;
double* n_end = A + w;
for (double* A_n = n_begin; A_n < n_end; ++A_n) {
if (fabs(*A_n) > biga_absval) {
biga_absval = fabs(*A_n);
biga = A_n;
}
}
std::cout << biga;
return 0;
}
int f_faster(double *A, size_t from, size_t w)
{
double biga_absval = *A;
size_t ir = 0,ic=0;
for ( size_t j = 0; j < w; j++ ) {
size_t n = j;
for ( ; n < j*w+w; n++ ) {
if ( fabs(A[n]) > biga_absval ) {
biga_absval = fabs( A[n] );
ir = j;
ic = n - j*w;
}
n++;
}
}
std::cout << ir <<ic;
return 0;
}
Please note: examples were created to look at output only (and indexes etc. don't neccessarily make sense):
https://godbolt.org/z/q9rWwi
So my question is: is this just a (known?) optimizer bug (?) or is there some logic behind what in this case seems like a clear optimization miss ?
Using latest stable vs2017 15.9.5
Update: The extra movs I see is before the jump codes - easiest way to find in compiler explorer is to right click on the if
and then "scroll to".
Well, I don't know why VC gets worse in your case, but I would like to offer some hint how to safe some ops.
void f_faster( const double* A, const std::size_t w ) {
double biga_absval = A[ 0 ];
std::size_t ir, ic_n;
for ( std::size_t j = 0; j < w; ++j ) {
const auto N = j * w + w;
for ( std::size_t n = j; n < N; n += 2 ) {
if ( const auto new_big_a = std::fabs( A[ n ] ); new_big_a > biga_absval ) {
biga_absval = new_big_a;
ir = j;
ic_n = n;
}
}
}
std::cout << ir << ( ic_n - ir * w );
}
Maybe that's already enough to get rid of the extra mov?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With