I am creating a program which needs to be ultra-fast.
It is running some stuff on the GPU using CUDA and afterwards it does some calculations on the CPU. For this, I need to convert the highly optimized GPU-datastructure to something that I can easily use on the CPU. My data is basically a graph laid out in a grid.
Currently I am using std::vector for the CPU part. Because I know there is quite an overhead if I do a lot of push_back()
s and I at least know because I know how many vertices I have in my graph, I now use the following code for this:
new_graph.resize(blockSize * blockSize);
for (unsigned long long y = 0; y < blockSize; y++) {
for (unsigned long long x = 0; x < blockSize; x++) {
int idx = y * blockSize + x;
new_graph[idx] = Vertex(x, y);
}
}
Afterwards I add the edges. Unfortunately I do not know how many edges per vertex I have, but I do know that it will never be bigger than 8. Therefore I reserve()
8 in each std::vector that I use for the edges.
However, this both seem to be extremely slow. If I use a normal array for the graph itself (so basically replacing the outer std::vector), the speed improvement in that part is enormous (like 10x or so).
For the graph this is doable, but for the edges not really, because I do some post-procsesing on these edges and for this I really need something like std::vector which is kinda dynamic (I add some edges).
Currently converting the data to std::vector's is something like 10 time slower than running my algorithm on the GPU (which is a smart MST algorithm). This is not really what I want, because now the overhead is way too big.
Does someone know what is going on or how I can fix this?
p.s. I compile with -O2, because I already found out that that can make a big difference. Also tried with -O3, no real difference.
Vertex is defined as follows:
struct Pos {
int x, y;
Pos() {
x = 0;
y = 0;
}
Pos(int x, int y) {
this->x = x;
this->y = y;
}
};
struct Vertex {
Pos pos;
bool hidden;
unsigned long long newIdx;
Vertex() {
this->pos = Pos();
this->hidden = false;
this->numEdges = 0;
this->numRemovedEdges = 0;
}
Vertex(Pos &pos) {
this->pos = pos;
this->hidden = false;
this->numEdges = 0;
this->numRemovedEdges = 0;
}
Vertex(int x, int y) {
this->pos = Pos(x, y);
this->hidden = false;
this->numEdges = 0;
this->numRemovedEdges = 0;
}
int numEdges;
int numRemovedEdges;
std::vector<Edge> edges;
std::vector<bool> removed;
std::vector<bool> doNotWrite;
};
A std::vector can never be faster than an array, as it has (a pointer to the first element of) an array as one of its data members. But the difference in run-time speed is slim and absent in any non-trivial program. One reason for this myth to persist, are examples that compare raw arrays with mis-used std::vectors.
Vector is better for frequent insertion and deletion, whereas Arrays are much better suited for frequent access of elements scenario. Vector occupies much more memory in exchange for managing storage and growing dynamically, whereas Arrays are a memory-efficient data structure.
Vector are implemented as dynamic arrays with list interface whereas arrays can be implemented as statically or dynamically with primitive data type interface. Size of arrays are fixed whereas the vectors are resizable i.e they can grow and shrink as vectors are allocated on heap memory.
A Vector is a sequential-based container whereas an array is a data structure that stores a fixed number of elements (elements should of the same type) in sequential order. Vectors are sometimes also known as dynamic arrays.
Perhaps you are paying for a dynamic memory allocation that vector
does to reserve the space for its elements?
Even if you reserve
optimally, you'll have at least 3 memory allocations for each and every Vertex
(one for edges
, one for removed
and one for doNotWrite
). Dynamic memory allocation is potentially expensive relative to high-performance stuff you are trying to do here.
Either use plain old arrays that are guaranteed to be large enough (potentially wasting space), or a specialized memory allocator together with vector
, tailored to your specific needs.
Also, do you access the elements in memory order? Your example seems to suggest so, but do you do it in all cases?
Also, do you even need Vertex.pos
? Can't it be inferred from the Vertex
's position in the grid?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With