What should I take in consideration when developing a game in terms of fast memory access in C++?
The memory I load is static so I should put in in a continuous block of memory right?
Also, how should I organize the variables inside structs to improve performance?
Registers are faster than memory to access, so the variables which are most frequently used in a C program can be put in registers using register keyword.
The stack is faster because the access pattern makes it trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented), while the heap has much more complex bookkeeping involved in an allocation or free.
When a variable is created in C, a memory address is assigned to the variable. The memory address is the location of where the variable is stored on the computer. When we assign a value to the variable, it is stored in this memory address.
The processing time(Accessing time) of this memory is quite slow as compared to Stack-memory. Heap-memory is also not threaded-safe as Stack-memory because data stored in Heap-memory are visible to all threads. Size of Heap-memory is quite larger as compared to the Stack-memory.
Memory Performance is extremely vague.
I think that what you are looking for is about handling the CPU Cache as there is a factor of about 10 between an access in the cache and an access in the main memory.
For a complete reference on the mechanisms behind the cache, you might wish to read this excellent serie of articles by Ulrich Drepper on lwn.net.
In short:
Aim at Locality
You should not jump around in memory, so try (when possible) to group together items that will be used together.
Aim at Predictability
If your memory accesses are predictable, the CPU will likely prefetch the memory for the next chunk of work, so that it is available immediately, or shortly, after finishing the current chunk.
The typical example is with for
loops on arrays:
for (int i = 0; i != MAX; ++i)
for (int j = 0; j != MAX; ++j)
array[i][j] += 1;
Change array[i][j] += 1;
with array[j][i] += 1;
and the performance varies... at low optimization levels ;)
The compiler should catch those obvious cases, but some are more insidious. For example, the use of Node Based containers (linked lists, binary search trees) instead of array-based containers (vector, some hash tables) may slow down the application.
Don't waste space... beware of false sharing
Try to pack your structures. This has to do with alignment, and you might be wasting space due to alignment issues within your structures, which artificially inflate the structure size and waste cache space.
A typical rule of thumb is to order the items in the structure by decreasing size (use sizeof
). This is dumb, but works well. If you are more knowledgeable about the size and alignments, just avoid holes :) Note: only useful for structure with lots of instances...
However, beware of false sharing. In Multi Threaded programs, concurrent access to two variables that are close enough to share the same cache line is costly, because it involves a lot of cache invalidation and CPU battling for cache line ownership.
Profile
Unfortunately, this is HARD to figure out.
If you happen to be programming on Unix, Callgrind
(part of the Valgrind suite) can be run with cache simulation and identify the parts of the code triggering the cache misses.
I guess that there are other tools, I just never used them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With