I am implementing the C++ multiplication for matrices with different data structures and techniques (vectors , arrays and OpenMP) and I found a strange situation... My dynamic array version is working better: times: <blockquote> openmp mult_1: time: 5.882000 s array mult_2: time: 1.478000 s </blockquote> My compilation flags are: <blockquote> /usr/bin/g++ -fopenmp -pthread -std=c++1y -O3 </blockquote> C++ vector version <pre class="prettyprint"><code>typedef std::vector<std::vector<float>> matrix_f; void mult_1 (const matrix_f & matrixOne, const matrix_f & matrixTwo, matrix_f & result) { const int matrixSize = (int)result.size(); #pragma omp parallel for simd for (int rowResult = 0; rowResult < matrixSize; ++rowResult) { for (int colResult = 0; colResult < matrixSize; ++colResult) { for (int k = 0; k < matrixSize; ++k) { result[rowResult][colResult] += matrixOne[rowResult][k] * matrixTwo[k][colResult]; } } } } </code></pre> Dynamic array version <pre class="prettyprint"><code>void mult_2 ( float * matrixOne, float * matrixTwo, float * result, int size) { for (int row = 0; row < size; ++row) { for (int col = 0; col < size; ++col) { for (int k = 0; k < size; ++k) { (*(result+(size*row)+col)) += (*(matrixOne+(size*row)+k)) * (*(matrixTwo+(size*k)+col)); } } } } </code></pre> tests: C++ vector version <pre class="prettyprint"><code>utils::ChronoTimer timer; /* set Up simple matrix */ utils::matrix::matrix_f matr1 = std::vector<std::vector<float>>(size,std::vector<float>(size)); fillRandomMatrix(matr1); utils::matrix::matrix_f matr2 = std::vector<std::vector<float>>(size,std::vector<float>(size)); fillRandomMatrix(matr2); utils::matrix::matrix_f result = std::vector<std::vector<float>>(size,std::vector<float>(size)); timer.init(); utils::matrix::mult_1(matr1,matr2,result); std::printf("openmp mult_1: time: %f ms\n",timer.now() / 1000); </code></pre> Dynamic array version <pre class="prettyprint"><code>utils::ChronoTimer timer; float *p_matr1 = new float[size*size]; float *p_matr2 = new float[size*size]; float *p_result = new float[size*size]; fillRandomMatrixArray(p_matr1,size); fillRandomMatrixArray(p_matr2,size); timer.init(); utils::matrix::mult_2(p_matr1,p_matr2,p_result,size); std::printf("array mult_2: time: %f ms\n",timer.now() / 1000); delete [] p_matr1; delete [] p_matr2; delete [] p_result; </code></pre> I was checking some previous posts, but I couldn't find any related with my problem link, link2, link3: UPDATE: I refactorized tests with the answers, and vector works slighty better : <blockquote> vector mult: time: 1.194000 s array mult_2: time: 1.202000 s </blockquote> C++ vector version <pre class="prettyprint"><code>void mult (const std::vector<float> & matrixOne, const std::vector<float> & matrixTwo, std::vector<float> & result, int size) { for (int row = 0; row < size; ++row) { for (int col = 0; col < size; ++col) { for (int k = 0; k <size; ++k) { result[(size*row)+col] += matrixOne[(size*row)+k] * matrixTwo[(size*k)+col]; } } } } </code></pre> Dynamic array version <pre class="prettyprint"><code>void mult_2 ( float * matrixOne, float * matrixTwo, float * result, int size) { for (int row = 0; row < size; ++row) { for (int col = 0; col < size; ++col) { for (int k = 0; k < size; ++k) { (*(result+(size*row)+col)) += (*(matrixOne+(size*row)+k)) * (*(matrixTwo+(size*k)+col)); } } } } </code></pre> Also, my vectorized version is working better(0.803 s);

A vector of vectors is analogous to a jagged array -- an array where each entry is a pointer, each pointer pointing at another array of floats. In comparison, the raw array version is one block of memory, where you do math to find the elements. Use a single vector, not a vector of vectors, and do the math manually. Or, use a vector of fix-sized <code>std::array</code>s. Or write a helper type that takes the (one-dimensional) vector of float, and gives you a 2 dimensional view of it. Data in a contiguous buffer is more cache and optimization friendly than data in a bunch of disconnected buffers. <pre class="prettyprint"><code>template<std::size_t Dim, class T> struct multi_dim_array_view_helper { std::size_t const* dims; T* t; std::size_t stride() const { return multi_dim_array_view_helper<Dim-1, T>{dims+1, nullptr}.stride() * *dims; } multi_dim_array_view_helper<Dim-1, T> operator[](std::size_t i)const{ return {dims+1, t+i*stride()}; } }; template<class T> struct multi_dim_array_view_helper<1, T> { std::size_t stride() const{ return 1; } T* t; T& operator[](std::size_t i)const{ return t[i]; } multi_dim_array_view_helper( std::size_t const*, T* p ):t(p) {} }; template<std::size_t Dims> using dims_t = std::array<std::size_t, Dims-1>; template<std::size_t Dims, class T> struct multi_dim_array_view_storage { dims_t<Dims> storage; }; template<std::size_t Dims, class T> struct multi_dim_array_view: multi_dim_array_view_storage<Dims, T>, multi_dim_array_view_helper<Dims, T> { multi_dim_array_view( dims_t<Dims> d, T* t ): multi_dim_array_view_storage<Dims, T>{std::move(d)}, multi_dim_array_view_helper<Dims, T>{ this->storage.data(), t } {} }; </code></pre> now you can do this: <pre class="prettyprint"><code>std::vector<float> blah = { 01.f, 02.f, 03.f, 11.f, 12.f, 13.f, 21.f, 22.f, 23.f, }; multi_dim_array_view<2, float> view = { {3}, blah.data() }; for (std::size_t i = 0; i < 3; ++i ) { std::cout << "["; for (std::size_t j = 0; j < 3; ++j ) std::cout << view[i][j] << ","; std::cout << "]\n"; } </code></pre> live example No data is copied in the view class. It just provides a view of the flat array that is a multi-dimensional array.

Your approaches are quite different: <ul> <li>In the "dynamic array" version you allocate a single chunk of memory for each matrix and map the rows of the matrices onto that one dimensional memory range.</li> <li>In the "vector" version you use vectors of vectors which are "real" and "dynamically" two dimensional meaning that the storage position of each row of your matrices is unrelated with respect to the other rows.</li> </ul> What you probably want to do is: <ul> <li>Use <code>vector<float>(size*size)</code> and perform the very same mapping you're doing in the "dynamic array" example by hand or</li> <li>Write a class that internally handles the mapping for you and provides a 2-dimensional access interface (<code>T& operator()(size_t, size_t)</code> or some kind of <code>row_proxy operator[](size_t)</code> where <code>row_proxy</code> in turn has <code>T& operator[](size_t)</code>)</li> </ul>

Why does C++ multiplication with dynamic array work better than std::vector version

Tags:

c++

arrays

c++11

matrix

openmp

I am implementing the C++ multiplication for matrices with different data structures and techniques (vectors , arrays and OpenMP) and I found a strange situation... My dynamic array version is working better:

times:

openmp mult_1: time: 5.882000 s

array mult_2: time: 1.478000 s

My compilation flags are:

/usr/bin/g++ -fopenmp -pthread -std=c++1y -O3

C++ vector version

typedef std::vector<std::vector<float>> matrix_f;
void mult_1 (const matrix_f &  matrixOne, const matrix_f & matrixTwo, matrix_f & result) {
    const int matrixSize = (int)result.size();
    #pragma omp parallel for simd
    for (int rowResult = 0; rowResult < matrixSize; ++rowResult) {
        for (int colResult = 0; colResult < matrixSize; ++colResult) {
            for (int k = 0; k < matrixSize; ++k) {
                result[rowResult][colResult] += matrixOne[rowResult][k] * matrixTwo[k][colResult];  
            }
        }
    }
}

Dynamic array version

void mult_2 ( float *  matrixOne, float * matrixTwo,  float * result, int size)  {
    for (int row = 0; row < size; ++row) {
        for (int col = 0; col < size; ++col) {
            for (int k = 0; k < size; ++k) {
                (*(result+(size*row)+col)) += (*(matrixOne+(size*row)+k)) * (*(matrixTwo+(size*k)+col));
            }
        }
    }
}

tests:

C++ vector version

utils::ChronoTimer timer;
/* set Up simple matrix */
utils::matrix::matrix_f matr1 = std::vector<std::vector<float>>(size,std::vector<float>(size));
fillRandomMatrix(matr1);

utils::matrix::matrix_f matr2 = std::vector<std::vector<float>>(size,std::vector<float>(size));
fillRandomMatrix(matr2);

utils::matrix::matrix_f result = std::vector<std::vector<float>>(size,std::vector<float>(size));    
timer.init();
utils::matrix::mult_1(matr1,matr2,result);
std::printf("openmp mult_1: time: %f ms\n",timer.now() / 1000);

Dynamic array version

utils::ChronoTimer timer;

float *p_matr1 = new float[size*size];
float *p_matr2 = new float[size*size];
float *p_result = new float[size*size];

fillRandomMatrixArray(p_matr1,size);
fillRandomMatrixArray(p_matr2,size);

timer.init();
utils::matrix::mult_2(p_matr1,p_matr2,p_result,size);
std::printf("array mult_2: time: %f ms\n",timer.now() / 1000);

delete [] p_matr1;
delete [] p_matr2;
delete [] p_result;

I was checking some previous posts, but I couldn't find any related with my problem link, link2, link3:

UPDATE: I refactorized tests with the answers, and vector works slighty better :

vector mult: time: 1.194000 s

array mult_2: time: 1.202000 s

C++ vector version

void mult (const std::vector<float> &  matrixOne, const std::vector<float> & matrixTwo, std::vector<float> & result, int size) {
    for (int row = 0; row < size; ++row) {
        for (int col = 0; col < size; ++col) {
            for (int k = 0; k <size; ++k) {
                result[(size*row)+col] += matrixOne[(size*row)+k] * matrixTwo[(size*k)+col];
            }
        }
    }
}

Dynamic array version

void mult_2 ( float *  matrixOne, float * matrixTwo,  float * result, int size)  {
    for (int row = 0; row < size; ++row) {
        for (int col = 0; col < size; ++col) {
            for (int k = 0; k < size; ++k) {
                (*(result+(size*row)+col)) += (*(matrixOne+(size*row)+k)) * (*(matrixTwo+(size*k)+col));
            }
        }
    }
}

Also, my vectorized version is working better(0.803 s);

254

asked Feb 25 '16 11:02

carlos.baez

2 Answers

A vector of vectors is analogous to a jagged array -- an array where each entry is a pointer, each pointer pointing at another array of floats.

In comparison, the raw array version is one block of memory, where you do math to find the elements.

Use a single vector, not a vector of vectors, and do the math manually. Or, use a vector of fix-sized std::arrays. Or write a helper type that takes the (one-dimensional) vector of float, and gives you a 2 dimensional view of it.

Data in a contiguous buffer is more cache and optimization friendly than data in a bunch of disconnected buffers.

template<std::size_t Dim, class T>
struct multi_dim_array_view_helper {
  std::size_t const* dims;
  T* t;
  std::size_t stride() const {
    return
      multi_dim_array_view_helper<Dim-1, T>{dims+1, nullptr}.stride()
      * *dims;
  }
  multi_dim_array_view_helper<Dim-1, T> operator[](std::size_t i)const{
    return {dims+1, t+i*stride()};
  }
};
template<class T>
struct multi_dim_array_view_helper<1, T> {
  std::size_t stride() const{ return 1; }
  T* t;
  T& operator[](std::size_t i)const{
    return t[i];
  }
  multi_dim_array_view_helper( std::size_t const*, T* p ):t(p) {}
};
template<std::size_t Dims>
using dims_t = std::array<std::size_t, Dims-1>;
template<std::size_t Dims, class T>
struct multi_dim_array_view_storage
{
  dims_t<Dims> storage;
};
template<std::size_t Dims, class T>
struct multi_dim_array_view:
  multi_dim_array_view_storage<Dims, T>,
  multi_dim_array_view_helper<Dims, T>
{
  multi_dim_array_view( dims_t<Dims> d, T* t ):
    multi_dim_array_view_storage<Dims, T>{std::move(d)},
    multi_dim_array_view_helper<Dims, T>{
      this->storage.data(), t
    }
  {}
};

now you can do this:

std::vector<float> blah = {
   01.f, 02.f, 03.f,
   11.f, 12.f, 13.f,
   21.f, 22.f, 23.f,
};

multi_dim_array_view<2, float> view = { {3}, blah.data() };
for (std::size_t i = 0; i < 3; ++i )
{
  std::cout << "[";
  for (std::size_t j = 0; j < 3; ++j )
    std::cout << view[i][j] << ",";
  std::cout << "]\n";
}

live example

No data is copied in the view class. It just provides a view of the flat array that is a multi-dimensional array.

answered Oct 28 '22 10:10

Yakk - Adam Nevraumont

Your approaches are quite different:

In the "dynamic array" version you allocate a single chunk of memory for each matrix and map the rows of the matrices onto that one dimensional memory range.
In the "vector" version you use vectors of vectors which are "real" and "dynamically" two dimensional meaning that the storage position of each row of your matrices is unrelated with respect to the other rows.

What you probably want to do is:

Use vector<float>(size*size) and perform the very same mapping you're doing in the "dynamic array" example by hand or
Write a class that internally handles the mapping for you and provides a 2-dimensional access interface (T& operator()(size_t, size_t) or some kind of row_proxy operator[](size_t) where row_proxy in turn has T& operator[](size_t))

answered Oct 28 '22 11:10

Pixelchemist

Related questions
                            
                                boost::asio::yield_context: unexpected forced_unwind exception
                            
                                Are trigraphs still valid C++?
                            
                                Opening many small files on NTFS is way too slow
                            
                                How is it possible to get a reference to an rvalue?
                            
                                "using" keyword in C++
                            
                                Qt: qthread destroyed while thread is still running during closing
                            
                                Move constructor for std::string from char*
                            
                                Is the value of expression f() > g(), when f & g modify same global variable undefined or unspecified?
                            
                                Fastest C++ way to convert float to string
                            
                                how to handle arbitrary dimensional vector in c++?
                            
                                How does extern work in namespaces?
                            
                                const static auto lambda used with capture by reference
                            
                                How to disable automatic mnemonics in a Qt application on KDE?
                            
                                How to move unique_ptr to raw pointer?
                            
                                Why is it allowed to omit the first dimension, but not the other dimensions when declaring a multi-dimensonal array?
                            
                                Boost: unrecognised option for positional argument
                            
                                What can I do with C++ modules in Visual Studio 2015 so far? (using the experimental switch)
                            
                                How to avoid LLVM's Support CommandLine leaking library arguments?
                            
                                "gets() was not declared in this scope" error [duplicate]
                            
                                C++ - Copy Assignment Operator is Implicitly Deleted

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does C++ multiplication with dynamic array work better than std::vector version

Tags:

c++

arrays

c++11

matrix

openmp

carlos.baez

People also ask

2 Answers

Yakk - Adam Nevraumont

Pixelchemist

Recent Activity

Donate For Us