This is my little big question about containers, in particular, arrays.
I am writing a physics code that mainly manipulates a big (> 1 000 000) set of "particles" (with 6 double
coordinates each). I am looking for the best way (in term of performance) to implement a class that will contain a container for these data and that will provide manipulation primitives for these data (e.g. instantiation, operator[]
, etc.).
There are a few restrictions on how this set is used:
First conclusion, as the access on the elements is essentially done by accessing each element one by one with []
, I think that I should use a normal dynamic array.
I have explored a few things, and I would like to have your opinion on the one that can give me the best performances.
As I understand there is no advantage to use a dynamically allocated array instead of a std::vector
, so things like double** array2d = new ..., loop of new, etc
are ruled out.
So is it a good idea to use std::vector<double>
?
If I use a std::vector
, should I create a two dimensional array like std::vector<std::vector<double> > my_array
that can be indexed like my_array[i][j]
, or is it a bad idea and it would be better to use std::vector<double> other_array
and acces it with other_array[6*i+j]
.
Maybe this can gives better performance, especially as the number of columns is fixed and known from the beginning.
If you think that this is the best option, would it be possible to wrap this vector in a way that it can be accessed with a index operator defined as other_array[i,j] // same as other_array[6*i+j]
without overhead (like function call at each access) ?
Another option, the one that I am using so far is to use Blitz, in particular blitz::Array
:
typedef blitz::Array<double,TWO_DIMENSIONS> store_t;
store_t my_store;
Where my elements are accessed like that: my_store(line, column);
.
I think there are not much advantage to use Blitz in my case because I am accessing each element one by one and that Blitz would be interesting if I was using operations directly on array (like matrix multiplication) which I am not.
Do you think that Blitz is OK, or is it useless in my case ?
These are the possibilities I have considered so far, but maybe the best one I still another one, so don't hesitate to suggest me other things.
Thanks a lot for your help on this problem !
Edit:
From the very interesting answers and comments bellow a good solution seems to be the following:
particle
(containing 6 doubles) or a static array of 6 doubles (this avoid the use of two dimensional dynamic arrays)vector
or a deque
of this particle
structure or array. It is then good to traverse them with iterators, and that will allow to change from one to another later.In addition I can also use a Blitz::TinyVector<double,6>
instead of a structure.
So is it a good idea to use
std::vector<double>
?
Usually, a std::vector
should be the first choice of container. You could use either std::vector<>::reserve()
or std::vector<>::resize()
to avoid reallocations while populating the vector. Whether any other container is better can be found by measuring. And only by measuring. But first measure whether anything the container is involved in (populating, accessing elements) is worth optimizing at all.
If I use a std::vector, should I create a two dimensional array like
std::vector<std::vector<double> >
[...]?
No. IIUC, you are accessing your data per particle, not per row. If that's the case, why not use a std::vector<particle>
, where particle
is a struct holding six values? And even if I understood incorrectly, you should rather write a two-dimensional wrapper around a one-dimensional container. Then align your data either in rows or columns - what ever is faster with your access patterns.
Do you think that Blitz is OK, or is it useless in my case?
I have no practical knowledge about blitz++ and the areas it is used in. But isn't blitz++ all about expression templates to unroll loop operations and optimizing away temporaries when doing matrix manipulations? ICBWT.
First of all, you don't want to scatter the coordinates of one given particle all over the place, so I would begin by writing a simple struct
:
struct Particle { /* coords */ };
Then we can make a simple one dimensional array of these Particles
.
I would probably use a deque
, because that's the default container, but you may wish to try a vector
, it's just that 1.000.000 of particles means about a single chunk of a few MBs. It should hold but it might strain your system if this ever grows, while the deque
will allocate several chunks.
WARNING:
As Alexandre C remarked, if you go the deque
road, refrain from using operator[]
and prefer to use iteration style. If you really need random access and it's performance sensitive, the vector
should prove faster.
The first rule when choosing from containers is to use std::vector
. Then, only after your code is complete and you can actually measure performance, you can try other containers. But stick to vector first. (And use reserve()
from the start)
Then, you shouldn't use an std::vector<std::vector<double> >
. You know the size of your data: it's 6 doubles. No need for it to be dynamic. It is constant and fixed. You can define a struct to hold you particle members (the six doubles), or you can simply typedef it: typedef double particle[6]
. Then, use a vector of particles: std::vector<particle>
.
Furthermore, as your program uses the particle data contained in the vector sequentially, you will take advantage of the modern CPU cache read-ahead feature at its best performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With