LAPACK/BLAS versus simple "for" loops

1 Answers

Vendor-provided LAPACK / BLAS libraries (Intel's IPP/MKL have been mentioned, but there's also AMD's ACML, and other CPU vendors like IBM/Power or Oracle/SPARC provide equivalents as well) are often highly optimized for specific CPU abilities that'll significantly boost performance on large datasets.

Often, though, you've got very specific small data to operate on (say, 4x4 matrices or 4D dot products, i.e. operations used in 3D geometry processing) and for those sort of things, BLAS/LAPACK are overkill, because of initial tests done by these subroutines which codepaths to choose, depending on properties of the data set. In those situations, simple C/C++ sourcecode, maybe using SSE2...4 intrinsics and/or compiler-generated vectorization, may beat BLAS/LAPACK.
That's why, for example, Intel has two libraries - MKL for large linear algebra datasets, and IPP for small (graphics vectors) data sets.

In that sense,

what exactly is your data set ?
What matrix/vector sizes ?
What linear algebra operations ?

Also, regarding "simple for loops": Give the compiler the chance to vectorize for you. I.e. something like:

Click to copy

for (i = 0; i < DIM_OF_MY_VECTOR; i += 4) {
    vecmul[i] = src1[i] * src2[i];
    vecmul[i+1] = src1[i+1] * src2[i+1];
    vecmul[i+2] = src1[i+2] * src2[i+2];
    vecmul[i+3] = src1[i+3] * src2[i+3];
}
for (i = 0; i < DIM_OF_MY_VECTOR; i += 4)
    dotprod += vecmul[i] + vecmul[i+1] + vecmul[i+2] + vecmul[i+3];

might be a better feed to a vectorizing compiler than the plain

Click to copy

for (i = 0; i < DIM_OF_MY_VECTOR; i++) dotprod += src1[i]*src2[i];

expression. In some ways, what you mean by calculations with for loops will have a significant impact.
If your vector dimensions are large enough though, the BLAS version,

Click to copy

dotprod = CBLAS.ddot(DIM_OF_MY_VECTOR, src1, 1, src2, 1);

will be cleaner code and likely faster.

On the reference side, these might be of interest:

Intel Math Kernel Libraries Documentation (LAPACK / BLAS and others optimized for Intel CPUs)
Intel Performance Primitives Documentation (optimized for small vectors / geometry processing)
AMD Core Math Libraries (LAPACK / BLAS and others for AMD CPUs)
Eigen Libraries (a "nicer" linear algebra interface)

156

answered Sep 18 '22 23:09

FrankH.

Related questions
                            
                                constexpr initialization of array to sort contents
                            
                                C++ declare 'main' as a reference to function?
                            
                                How to detect if a type is shared_ptr at compile time
                            
                                Compiler generates costly MOVZX instruction
                            
                                How to tell if template type is an instance of a template class?
                            
                                Usage of macros in std::string source
                            
                                I just installed visual studio and ran into some errors specifically The WindowsSDKDir property is not defined. Some build tools may not be found
                            
                                Why can’t variables be declared in a switch statement?
                            
                                Build An Linux Executable Using GCC
                            
                                How do use a std::auto_ptr in a class you have to copy construct?
                            
                                const_cast in template. Is there a unconst modifier?
                            
                                C++ std::string and NULL const char*
                            
                                Get number of bits in char
                            
                                Return an array in c++
                            
                                Why does C++ allow variable length arrays that aren't dynamically allocated?
                            
                                Random seed at runtime
                            
                                How can I read from memory just like from a file using iostream?
                            
                                Need C++ parser
                            
                                Algorithm to generate all possible arrays of ones and zeros of a given length
                            
                                const - Shouldn't it not change

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

LAPACK/BLAS versus simple "for" loops

Tags:

c++

performance

c

lapack

blas

behzad.nouri

People also ask

1 Answers

FrankH.

Recent Activity

Donate For Us