Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the Disadvantages of Nested Vectors?

I'm still fairly new to C++ and have a lot left to learn, but something that I've become quite attached to recently is using nested (multidimensional) vectors. So I may typically end up with something like this:

std::vector<std::vector<std::string> > table;

Which I can then easily access elements of like this:

std::string data = table[3][5];

However, recently I've been getting the impression that it's better (in terms of performance) to have a single-dimensional vector and then just use "index arithmetic" to access elements correspondingly. I assume this performance impact is significant for much larger or higher dimensional vectors, but I honestly have no idea and haven't been able to find much information about it so far.

While, intuitively, it kind of makes sense that a single vector would have better performance than a a higher dimensional one, I honestly don't understand the actual reasons why. Furthermore, if I were to just use single-dimensional vectors, I would lose the intuitive syntax I have for accessing elements of multidimensional ones. So here are my questions:

Why are multidimensional vectors inefficient? If I were to only use a single-dimensional vector instead (to represent data in higher dimensions), what would be the best, most intuitive way to access its elements?

like image 296
JohnTravolski Avatar asked Jul 26 '17 03:07

JohnTravolski


1 Answers

It depends on the exact conditions. I'll talk about the case, when the nested version is a true 2D table (i.e., all rows have equal length).

A 1D vector usually will be faster on every usage patterns. Or, at least, it won't be slower than the nested version.

Nested version can be considered worse, because:

  • it needs to allocate number-of-rows times, instead of one.
  • accessing an element takes an additional indirection, so it is slower (additional indirection is usually slower than the multiply needed in the 1D case)
  • if you process your data sequentially, then it could be much slower, if the 2D data is scattered around the memory. It is because there could be a lot of cache misses, depending how the memory allocator returns memory areas of different rows.

So, if you go for performance, I'd recommend you to create a 2D-wrapper class for 1D vector. This way, you could get as simple API as the nested version, and you'll get the best performance too. And even, if for some cause, you decide to use the nested version instead, you can just change the internal implementation of this wrapper class.

The most intuitive way to access 1D elements is y*width+x. But, if you know your access patterns, you can choose a different one. For example, in a painting program, a tile based indexing could be better for storing and manipulating the image. Here, data can be indexed like this:

int tileMask = (1<<tileSizeL)-1; // tileSizeL is log of tileSize
int tileX = x>>tileSizeL;
int tileY = y>>tileSizeL;
int tileIndex = tileY*numberOfTilesInARow + tileX;
int index = (tileIndex<<(tileSizeL*2)) + ((y&tileMask)<<tileSizeL) + (x&tileMask);

This method has a better spatial locality in memory (pixels near to each other tend to have a near memory address). Index calculation is slower than a simple y*width+x, but this method could have much less cache misses, so in the end, it could be faster.

like image 120
geza Avatar answered Sep 28 '22 07:09

geza