I am implementing the C++ multiplication for matrices with different data structures and techniques (vectors , arrays and OpenMP) and I found a strange situation... My dynamic array version is working better:
times:
openmp mult_1: time: 5.882000 s
array mult_2: time: 1.478000 s
My compilation flags are:
/usr/bin/g++ -fopenmp -pthread -std=c++1y -O3
C++ vector version
typedef std::vector<std::vector<float>> matrix_f;
void mult_1 (const matrix_f & matrixOne, const matrix_f & matrixTwo, matrix_f & result) {
const int matrixSize = (int)result.size();
#pragma omp parallel for simd
for (int rowResult = 0; rowResult < matrixSize; ++rowResult) {
for (int colResult = 0; colResult < matrixSize; ++colResult) {
for (int k = 0; k < matrixSize; ++k) {
result[rowResult][colResult] += matrixOne[rowResult][k] * matrixTwo[k][colResult];
}
}
}
}
Dynamic array version
void mult_2 ( float * matrixOne, float * matrixTwo, float * result, int size) {
for (int row = 0; row < size; ++row) {
for (int col = 0; col < size; ++col) {
for (int k = 0; k < size; ++k) {
(*(result+(size*row)+col)) += (*(matrixOne+(size*row)+k)) * (*(matrixTwo+(size*k)+col));
}
}
}
}
tests:
C++ vector version
utils::ChronoTimer timer;
/* set Up simple matrix */
utils::matrix::matrix_f matr1 = std::vector<std::vector<float>>(size,std::vector<float>(size));
fillRandomMatrix(matr1);
utils::matrix::matrix_f matr2 = std::vector<std::vector<float>>(size,std::vector<float>(size));
fillRandomMatrix(matr2);
utils::matrix::matrix_f result = std::vector<std::vector<float>>(size,std::vector<float>(size));
timer.init();
utils::matrix::mult_1(matr1,matr2,result);
std::printf("openmp mult_1: time: %f ms\n",timer.now() / 1000);
Dynamic array version
utils::ChronoTimer timer;
float *p_matr1 = new float[size*size];
float *p_matr2 = new float[size*size];
float *p_result = new float[size*size];
fillRandomMatrixArray(p_matr1,size);
fillRandomMatrixArray(p_matr2,size);
timer.init();
utils::matrix::mult_2(p_matr1,p_matr2,p_result,size);
std::printf("array mult_2: time: %f ms\n",timer.now() / 1000);
delete [] p_matr1;
delete [] p_matr2;
delete [] p_result;
I was checking some previous posts, but I couldn't find any related with my problem link, link2, link3:
UPDATE: I refactorized tests with the answers, and vector works slighty better :
vector mult: time: 1.194000 s
array mult_2: time: 1.202000 s
C++ vector version
void mult (const std::vector<float> & matrixOne, const std::vector<float> & matrixTwo, std::vector<float> & result, int size) {
for (int row = 0; row < size; ++row) {
for (int col = 0; col < size; ++col) {
for (int k = 0; k <size; ++k) {
result[(size*row)+col] += matrixOne[(size*row)+k] * matrixTwo[(size*k)+col];
}
}
}
}
Dynamic array version
void mult_2 ( float * matrixOne, float * matrixTwo, float * result, int size) {
for (int row = 0; row < size; ++row) {
for (int col = 0; col < size; ++col) {
for (int k = 0; k < size; ++k) {
(*(result+(size*row)+col)) += (*(matrixOne+(size*row)+k)) * (*(matrixTwo+(size*k)+col));
}
}
}
}
Also, my vectorized version is working better(0.803 s);
The conclusion is that arrays of integers are faster than vectors of integers (5 times in my example). However, arrays and vectors are arround the same speed for more complex / not aligned data. Save this answer.
Vector are implemented as dynamic arrays with list interface whereas arrays can be implemented as statically or dynamically with primitive data type interface. Size of arrays are fixed whereas the vectors are resizable i.e they can grow and shrink as vectors are allocated on heap memory.
A std::vector can never be faster than an array, as it has (a pointer to the first element of) an array as one of its data members. But the difference in run-time speed is slim and absent in any non-trivial program. One reason for this myth to persist, are examples that compare raw arrays with mis-used std::vectors.
Array is memory efficient data structure. Vector takes more time in accessing elements. Array access elements in constant time irrespective of their location as elements are arranged in a contiguous memory allocation.
A vector of vectors is analogous to a jagged array -- an array where each entry is a pointer, each pointer pointing at another array of floats.
In comparison, the raw array version is one block of memory, where you do math to find the elements.
Use a single vector, not a vector of vectors, and do the math manually. Or, use a vector of fix-sized std::array
s. Or write a helper type that takes the (one-dimensional) vector of float, and gives you a 2 dimensional view of it.
Data in a contiguous buffer is more cache and optimization friendly than data in a bunch of disconnected buffers.
template<std::size_t Dim, class T>
struct multi_dim_array_view_helper {
std::size_t const* dims;
T* t;
std::size_t stride() const {
return
multi_dim_array_view_helper<Dim-1, T>{dims+1, nullptr}.stride()
* *dims;
}
multi_dim_array_view_helper<Dim-1, T> operator[](std::size_t i)const{
return {dims+1, t+i*stride()};
}
};
template<class T>
struct multi_dim_array_view_helper<1, T> {
std::size_t stride() const{ return 1; }
T* t;
T& operator[](std::size_t i)const{
return t[i];
}
multi_dim_array_view_helper( std::size_t const*, T* p ):t(p) {}
};
template<std::size_t Dims>
using dims_t = std::array<std::size_t, Dims-1>;
template<std::size_t Dims, class T>
struct multi_dim_array_view_storage
{
dims_t<Dims> storage;
};
template<std::size_t Dims, class T>
struct multi_dim_array_view:
multi_dim_array_view_storage<Dims, T>,
multi_dim_array_view_helper<Dims, T>
{
multi_dim_array_view( dims_t<Dims> d, T* t ):
multi_dim_array_view_storage<Dims, T>{std::move(d)},
multi_dim_array_view_helper<Dims, T>{
this->storage.data(), t
}
{}
};
now you can do this:
std::vector<float> blah = {
01.f, 02.f, 03.f,
11.f, 12.f, 13.f,
21.f, 22.f, 23.f,
};
multi_dim_array_view<2, float> view = { {3}, blah.data() };
for (std::size_t i = 0; i < 3; ++i )
{
std::cout << "[";
for (std::size_t j = 0; j < 3; ++j )
std::cout << view[i][j] << ",";
std::cout << "]\n";
}
live example
No data is copied in the view class. It just provides a view of the flat array that is a multi-dimensional array.
Your approaches are quite different:
In the "dynamic array" version you allocate a single chunk of memory for each matrix and map the rows of the matrices onto that one dimensional memory range.
In the "vector" version you use vectors of vectors which are "real" and "dynamically" two dimensional meaning that the storage position of each row of your matrices is unrelated with respect to the other rows.
What you probably want to do is:
Use vector<float>(size*size)
and perform the very same mapping you're doing in the "dynamic array" example by hand or
Write a class that internally handles the mapping for you and provides a 2-dimensional access interface (T& operator()(size_t, size_t)
or some kind of row_proxy operator[](size_t)
where row_proxy
in turn has T& operator[](size_t)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With