Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

1D or 2D array, what's faster?

Tags:

c++

arrays

c

I'm in need of representing a 2D field (axes x, y) and I face a problem: Should I use an 1D array or a 2D array?

I can imagine, that recalculating indices for 1D arrays (y + x*n) could be slower than using 2D array (x, y) but I could image that 1D could be in CPU cache..

I did some googling, but only found pages regarding static array (and stating that 1D and 2D are basically the same). But my arrays must me dynamic.

So, what's

  1. faster,
  2. smaller (RAM)

dynamic 1D arrays or dynamic 2D arrays?

like image 567
graywolf Avatar asked Jun 23 '13 10:06

graywolf


People also ask

Is 1D better than 2D?

1D Barcoding and Scanners 1D barcode scanners can only scan 1D barcodes. Their range, however, is 50% greater than a 2D imager, and they have better motion tolerance. This makes them a good choice if your staff will need to scan items from a distance or while moving along in a cart.

What is the difference between 1D and 2D array?

A one-dimensional array stores a single list of various elements having a similar data type. A two-dimensional array stores an array of various arrays, or a list of various lists, or an array of various one-dimensional arrays. It represents multiple data items in the form of a list.

How is 1D different from 2D?

The difference between 1D, 2D and 3D geophysical measurements is related to how you measure and process the data you collect. For 1D measurements data are only collected beneath a single point at the surface, for 2D a profile is measured and, for 3D, data from across a volume of ground is collected.

Why do we need 2D arrays What is the difference between 1D and 2D arrays?

The main difference between 1D and 2D array is that the 1D array represents multiple data items as a list while 2D array represents multiple data items as a table consisting of rows and columns. A variable is a memory location to store data of a specific type.


1 Answers

tl;dr : You should probably use a one-dimensional approach.

Note: One cannot dig into detail affecting performance when comparing dynamic 1d or dynamic 2d storage patterns without filling books since the performance of code is dependent one a very large number of parameters. Profile if possible.

1. What's faster?

For dense matrices the 1D approach is likely to be faster since it offers better memory locality and less allocation and deallocation overhead.

2. What's smaller?

Dynamic-1D consumes less memory than the 2D approach. The latter also requires more allocations.

Remarks

I laid out a pretty long answer beneath with several reasons but I want to make some remarks on your assumptions first.

I can imagine, that recalculating indices for 1D arrays (y + x*n) could be slower than using 2D array (x, y)

Let's compare these two functions:

int get_2d (int **p, int r, int c) { return p[r][c]; } int get_1d (int *p, int r, int c)  { return p[c + C*r]; } 

The (non-inlined) assembly generated by Visual Studio 2015 RC for those functions (with optimizations turned on) is:

?get_1d@@YAHPAHII@Z PROC push    ebp mov ebp, esp mov eax, DWORD PTR _c$[ebp] lea eax, DWORD PTR [eax+edx*4] mov eax, DWORD PTR [ecx+eax*4] pop ebp ret 0  ?get_2d@@YAHPAPAHII@Z PROC push ebp mov ebp, esp mov ecx, DWORD PTR [ecx+edx*4] mov eax, DWORD PTR _c$[ebp] mov eax, DWORD PTR [ecx+eax*4] pop ebp ret 0 

The difference is mov (2d) vs. lea (1d). The former has a latency of 3 cycles and a a maximum throughput of 2 per cycle while the latter has a latency of 2 cycles and a maximum throughput of 3 per cycle. (According to Instruction tables - Agner Fog Since the differences are minor, I think there should not be a big performance difference arising from index recalculation. I expect it to be very unlikely to identify this difference itself to be the bottleneck in any program.

This brings us to the next (and more interesting) point:

... but I could image that 1D could be in CPU cache ...

True, but 2d could be in CPU cache, too. See The Downsides: Memory locality for an explanation why 1d is still better.

The long answer, or why dynamic 2 dimensional data storage (pointer-to-pointer or vector-of-vector) is "bad" for simple / small matrices.

Note: This is about dynamic arrays/allocation schemes [malloc/new/vector etc.]. A static 2d array is a contiguous block of memory and therefore not subject to the downsides I'm going to present here.

The Problem

To be able to understand why a dynamic array of dynamic arrays or a vector of vectors is most likely not the data storage pattern of choice, you are required to understand the memory layout of such structures.

Example case using pointer to pointer syntax

int main (void) {     // allocate memory for 4x4 integers; quick & dirty     int ** p = new int*[4];     for (size_t i=0; i<4; ++i) p[i] = new int[4];       // do some stuff here, using p[x][y]       // deallocate memory     for (size_t i=0; i<4; ++i) delete[] p[i];     delete[] p; } 

The downsides

Memory locality

For this “matrix” you allocate one block of four pointers and four blocks of four integers. All of the allocations are unrelated and can therefore result in an arbitrary memory position.

The following image will give you an idea of how the memory may look like.

For the real 2d case:

  • The violet square is the memory position occupied by p itself.
  • The green squares assemble the memory region p points to (4 x int*).
  • The 4 regions of 4 contiguous blue squares are the ones pointed to by each int* of the green region

For the 2d mapped on 1d case:

  • The green square is the only required pointer int *
  • The blue squares ensemble the memory region for all matrix elements (16 x int).

Real 2d vs mapped 2d memory layout

This means that (when using the left layout) you will probably observe worse performance than for a contiguous storage pattern (as seen on the right), due to caching for instance.

Let's say a cache line is "the amount of data transfered into the cache at once" and let's imagine a program accessing the whole matrix one element after another.

If you have a properly aligned 4 times 4 matrix of 32 bit values, a processor with a 64 byte cache line (typical value) is able to "one-shot" the data (4*4*4 = 64 bytes). If you start processing and the data isn't already in the cache you'll face a cache miss and the data will be fetched from main memory. This load can fetch the whole matrix at once since it fits into a cache line, if and only if it is contiguously stored (and properly aligned). There will probably not be any more misses while processing that data.

In case of a dynamic, "real two-dimensional" system with unrelated locations of each row/column, the processor needs to load every memory location seperately. Eventhough only 64 bytes are required, loading 4 cache lines for 4 unrelated memory positions would -in a worst case scenario- actually transfer 256 bytes and waste 75% throughput bandwidth. If you process the data using the 2d-scheme you'll again (if not already cached) face a cache miss on the first element. But now, only the first row/colum will be in the cache after the first load from main memory because all other rows are located somewhere else in memory and not adjacent to the first one. As soon as you reach a new row/column there will again be a cache miss and the next load from main memory is performed.

Long story short: The 2d pattern has a higher chance of cache misses with the 1d scheme offering better potential for performance due to locality of the data.

Frequent Allocation / Deallocation

  • As many as N + 1 (4 + 1 = 5) allocations (using either new, malloc, allocator::allocate or whatever) are necessary to create the desired NxM (4×4) matrix.
  • The same number of proper, respective deallocation operations must be applied as well.

Therefore, it is more costly to create/copy such matrices in contrast to a single allocation scheme.

This is getting even worse with a growing number of rows.

Memory consumption overhead

I'll asumme a size of 32 bits for int and 32 bits for pointers. (Note: System dependency.)

Let's remember: We want to store a 4×4 int matrix which means 64 bytes.

For a NxM matrix, stored with the presented pointer-to-pointer scheme we consume

  • N*M*sizeof(int) [the actual blue data] +
  • N*sizeof(int*) [the green pointers] +
  • sizeof(int**) [the violet variable p] bytes.

That makes 4*4*4 + 4*4 + 4 = 84 bytes in case of the present example and it gets even worse when using std::vector<std::vector<int>>. It will require N * M * sizeof(int) + N * sizeof(vector<int>) + sizeof(vector<vector<int>>) bytes, that is 4*4*4 + 4*16 + 16 = 144 bytes in total, intead of 64 bytes for 4 x 4 int.

In addition -depending on the used allocator- each single allocation may well (and most likely will) have another 16 bytes of memory overhead. (Some “Infobytes” which store the number of allocated bytes for the purpose of proper deallocation.)

This means the worst case is:

N*(16+M*sizeof(int)) + 16+N*sizeof(int*) + sizeof(int**)
= 4*(16+4*4) + 16+4*4 + 4 = 164 bytes ! _Overhead: 156%_

The share of the overhead will reduce as the size of the matrix grows but will still be present.

Risk of memory leaks

The bunch of allocations requires an appropriate exception handling in order to avoid memory leaks if one of the allocations will fail! You’ll need to keep track of allocated memory blocks and you must not forget them when deallocating the memory.

If new runs of of memory and the next row cannot be allocated (especially likely when the matrix is very large), a std::bad_alloc is thrown by new.

Example:

In the above mentioned new/delete example, we'll face some more code if we want to avoid leaks in case of bad_alloc exceptions.

  // allocate memory for 4x4 integers; quick & dirty   size_t const N = 4;   // we don't need try for this allocation   // if it fails there is no leak   int ** p = new int*[N];   size_t allocs(0U);   try    { // try block doing further allocations     for (size_t i=0; i<N; ++i)      {       p[i] = new int[4]; // allocate       ++allocs; // advance counter if no exception occured     }   }   catch (std::bad_alloc & be)   { // if an exception occurs we need to free out memory     for (size_t i=0; i<allocs; ++i) delete[] p[i]; // free all alloced p[i]s     delete[] p; // free p     throw; // rethrow bad_alloc   }   /*      do some stuff here, using p[x][y]    */   // deallocate memory accoding to the number of allocations   for (size_t i=0; i<allocs; ++i) delete[] p[i];   delete[] p; 

Summary

There are cases where "real 2d" memory layouts fit and make sense (i.e. if the number of columns per row is not constant) but in the most simple and common 2D data storage cases they just bloat the complexity of your code and reduce the performance and memory efficiency of your program.

Alternative

You should use a contiguous block of memory and map your rows onto that block.

The "C++ way" of doing it is probably to write a class that manages your memory while considering important things like

  • What is The Rule of Three?
  • What is meant by Resource Acquisition is Initialization (RAII)?
  • C++ concept: Container (on cppreference.com)

Example

To provide an idea of how such a class may look like, here's a simple example with some basic features:

  • 2d-size-constructible
  • 2d-resizable
  • operator(size_t, size_t) for 2d- row major element access
  • at(size_t, size_t) for checked 2d-row major element access
  • Fulfills Concept requirements for Container

Source:

#include <vector> #include <algorithm> #include <iterator> #include <utility>  namespace matrices {    template<class T>   class simple   {   public:     // misc types     using data_type  = std::vector<T>;     using value_type = typename std::vector<T>::value_type;     using size_type  = typename std::vector<T>::size_type;     // ref     using reference       = typename std::vector<T>::reference;     using const_reference = typename std::vector<T>::const_reference;     // iter     using iterator       = typename std::vector<T>::iterator;     using const_iterator = typename std::vector<T>::const_iterator;     // reverse iter     using reverse_iterator       = typename std::vector<T>::reverse_iterator;     using const_reverse_iterator = typename std::vector<T>::const_reverse_iterator;      // empty construction     simple() = default;      // default-insert rows*cols values     simple(size_type rows, size_type cols)       : m_rows(rows), m_cols(cols), m_data(rows*cols)     {}      // copy initialized matrix rows*cols     simple(size_type rows, size_type cols, const_reference val)       : m_rows(rows), m_cols(cols), m_data(rows*cols, val)     {}      // 1d-iterators      iterator begin() { return m_data.begin(); }     iterator end() { return m_data.end(); }     const_iterator begin() const { return m_data.begin(); }     const_iterator end() const { return m_data.end(); }     const_iterator cbegin() const { return m_data.cbegin(); }     const_iterator cend() const { return m_data.cend(); }     reverse_iterator rbegin() { return m_data.rbegin(); }     reverse_iterator rend() { return m_data.rend(); }     const_reverse_iterator rbegin() const { return m_data.rbegin(); }     const_reverse_iterator rend() const { return m_data.rend(); }     const_reverse_iterator crbegin() const { return m_data.crbegin(); }     const_reverse_iterator crend() const { return m_data.crend(); }      // element access (row major indexation)     reference operator() (size_type const row,       size_type const column)     {       return m_data[m_cols*row + column];     }     const_reference operator() (size_type const row,       size_type const column) const     {       return m_data[m_cols*row + column];     }     reference at() (size_type const row, size_type const column)     {       return m_data.at(m_cols*row + column);     }     const_reference at() (size_type const row, size_type const column) const     {       return m_data.at(m_cols*row + column);     }      // resizing     void resize(size_type new_rows, size_type new_cols)     {       // new matrix new_rows times new_cols       simple tmp(new_rows, new_cols);       // select smaller row and col size       auto mc = std::min(m_cols, new_cols);       auto mr = std::min(m_rows, new_rows);       for (size_type i(0U); i < mr; ++i)       {         // iterators to begin of rows         auto row = begin() + i*m_cols;         auto tmp_row = tmp.begin() + i*new_cols;         // move mc elements to tmp         std::move(row, row + mc, tmp_row);       }       // move assignment to this       *this = std::move(tmp);     }      // size and capacity     size_type size() const { return m_data.size(); }     size_type max_size() const { return m_data.max_size(); }     bool empty() const { return m_data.empty(); }     // dimensionality     size_type rows() const { return m_rows; }     size_type cols() const { return m_cols; }     // data swapping     void swap(simple &rhs)     {       using std::swap;       m_data.swap(rhs.m_data);       swap(m_rows, rhs.m_rows);       swap(m_cols, rhs.m_cols);     }   private:     // content     size_type m_rows{ 0u };     size_type m_cols{ 0u };     data_type m_data{};   };   template<class T>   void swap(simple<T> & lhs, simple<T> & rhs)   {     lhs.swap(rhs);   }   template<class T>   bool operator== (simple<T> const &a, simple<T> const &b)   {     if (a.rows() != b.rows() || a.cols() != b.cols())     {       return false;     }     return std::equal(a.begin(), a.end(), b.begin(), b.end());   }   template<class T>   bool operator!= (simple<T> const &a, simple<T> const &b)   {     return !(a == b);   }  } 

Note several things here:

  • T needs to fulfill the requirements of the used std::vector member functions
  • operator() doesn't do any "of of range" checks
  • No need to manage data on your own
  • No destructor, copy constructor or assignment operators required

So you don't have to bother about proper memory handling for each application but only once for the class you write.

Restrictions

There may be cases where a dynamic "real" two dimensional structure is favourable. This is for instance the case if

  • the matrix is very large and sparse (if any of the rows do not even need to be allocated but can be handled using a nullptr) or if
  • the rows do not have the same number of columns (that is if you don't have a matrix at all but another two-dimensional construct).
like image 111
Pixelchemist Avatar answered Oct 02 '22 13:10

Pixelchemist