I need a way to represent a 2-D array (a dense matrix) of doubles in C++, with absolute minimum accessing overhead. I've done some timing on various linux/unix machines and gcc versions. An STL vector of vectors, declared as: <pre class="prettyprint"><code>vector<vector<double> > matrix(n,vector<double>(n)); </code></pre> and accessed through <code>matrix[i][j]</code> is between 5% and 100% slower to access than an array declared as: <pre class="prettyprint"><code>double *matrix = new double[n*n]; </code></pre> accessed through an inlined index function <code>matrix[index(i,j)]</code>, where <code>index(i,j)</code> evaluates to i+n*j. Other ways of arranging a 2-D array without STL - an array of n pointers to the start of each row, or defining the whole thing on the stack as a constant size <code>matrix[n][n]</code> - run at almost exactly the same speed as the index function method. Recent GCC versions (> 4.0) seem to be able to compile the STL vector-of-vectors to nearly the same efficiency as the non-STL code when optimisations are turned on, but this is somewhat machine-dependent. I'd like to use STL if possible, but will have to choose the fastest solution. Does anyone have any experience in optimising STL with GCC?

My guess would be the fastest is, for a matrix, to use 1D STL array and override the () operator to use it as 2D matrix. However, the STL also defines a type specifically for non-resizeable numerical arrays: valarray. You also have various optimisations for in-place operations. valarray accept as argument a numerical type: <pre class="prettyprint"><code>valarray<double> a; </code></pre> Then, you can use slices, indirect arrays, ... and of course, you can inherit the valarray and define your own operator()(int i, int j) for 2D arrays ...

Very likely this is a locality-of-reference issue. <code>vector</code> uses <code>new</code> to allocate its internal array, so each row will be at least a little apart in memory due to each block's header; it could be a long distance apart if memory is already fragmented when you allocate them. Different rows of the array are likely to at least incur a cache-line fault and could incur a page fault; if you're really unlucky two adjacent rows could be on memory lines that share a TLB slot and accessing one will evict the other. In contrast your other solutions guarantee that all the data is adjacent. It could help your performance if you align the structure so it crosses as few cache lines as possible. <code>vector</code> is designed for resizable arrays. If you don't need to resize the arrays, use a regular C++ array. STL operations can generally operate on C++ arrays. Do be sure to walk the array in the correct direction, i.e. across (consecutive memory addresses) rather than down. This will reduce cache faults.

Optimising C++ 2-D arrays

I need a way to represent a 2-D array (a dense matrix) of doubles in C++, with absolute minimum accessing overhead.

I've done some timing on various linux/unix machines and gcc versions. An STL vector of vectors, declared as:

vector<vector<double> > matrix(n,vector<double>(n));

and accessed through matrix[i][j] is between 5% and 100% slower to access than an array declared as:

double *matrix = new double[n*n];

accessed through an inlined index function matrix[index(i,j)], where index(i,j) evaluates to i+n*j. Other ways of arranging a 2-D array without STL - an array of n pointers to the start of each row, or defining the whole thing on the stack as a constant size matrix[n][n] - run at almost exactly the same speed as the index function method.

Recent GCC versions (> 4.0) seem to be able to compile the STL vector-of-vectors to nearly the same efficiency as the non-STL code when optimisations are turned on, but this is somewhat machine-dependent.

I'd like to use STL if possible, but will have to choose the fastest solution. Does anyone have any experience in optimising STL with GCC?

Can we allocate a 2 dimensional array dynamically?

A 2D array can be dynamically allocated in C using a single pointer. This means that a memory block of size row*column*dataTypeSize is allocated using malloc and pointer arithmetic can be used to access the matrix elements.

What is 2dimensional array in C?

A two-dimensional array in C can be thought of as a matrix with rows and columns. The general syntax used to declare a two-dimensional array is: A two-dimensional array is an array of several one-dimensional arrays. Following is an array with five rows, each row has three columns: int my_array[5][3];

How are the 2D arrays stored in the memory?

A 2D array is stored in the computer's memory one row following another. The address of the first byte of memory is considered as the memory location of the entire 2D array.

How is a 2 dimensional array initialised?

Like the one-dimensional arrays, two-dimensional arrays may be initialized by following their declaration with a list of initial values enclosed in braces. Ex: int a[2][3]={0,0,0,1,1,1}; initializes the elements of the first row to zero and the second row to one. The initialization is done row by row.

If you're using GCC the compiler can analyze your matrix accesses and change the order in memory in certain cases. The magic compiler flag is defined as:

-fipa-matrix-reorg

Perform matrix flattening and transposing. Matrix flattening tries to replace a m-dimensional matrix with its equivalent n-dimensional matrix, where n < m. This reduces the level of indirection needed for accessing the elements of the matrix. The second optimization is matrix transposing that attemps to change the order of the matrix's dimensions in order to improve cache locality. Both optimizations need fwhole-program flag. Transposing is enabled only if profiling information is avaliable.

Note that this option is not enabled by -O2 or -O3. You have to pass it yourself.

My guess would be the fastest is, for a matrix, to use 1D STL array and override the () operator to use it as 2D matrix.

However, the STL also defines a type specifically for non-resizeable numerical arrays: valarray. You also have various optimisations for in-place operations.

valarray accept as argument a numerical type:

valarray<double> a;

Then, you can use slices, indirect arrays, ... and of course, you can inherit the valarray and define your own operator()(int i, int j) for 2D arrays ...

Very likely this is a locality-of-reference issue. vector uses new to allocate its internal array, so each row will be at least a little apart in memory due to each block's header; it could be a long distance apart if memory is already fragmented when you allocate them. Different rows of the array are likely to at least incur a cache-line fault and could incur a page fault; if you're really unlucky two adjacent rows could be on memory lines that share a TLB slot and accessing one will evict the other.

In contrast your other solutions guarantee that all the data is adjacent. It could help your performance if you align the structure so it crosses as few cache lines as possible.

vector is designed for resizable arrays. If you don't need to resize the arrays, use a regular C++ array. STL operations can generally operate on C++ arrays.

Do be sure to walk the array in the correct direction, i.e. across (consecutive memory addresses) rather than down. This will reduce cache faults.

Optimising C++ 2-D arrays

Tags:

c++

linux

optimization

gcc

stl

Chris Johnson

People also ask

3 Answers

Nils Pipenbrinck

PierreBdR

Mike Dimmick

Recent Activity

Donate For Us

Optimising C++ 2-D arrays

Tags:

c++

linux

optimization

gcc

stl

Chris Johnson

People also ask

3 Answers

Nils Pipenbrinck

PierreBdR

Mike Dimmick

Related questions

Recent Activity

Donate For Us