I'm writing a C++14 JSON library as an exercise and to use it in my personal projects.
By using callgrind I've discovered that the current bottleneck during a continuous value creation from string stress test is an std::string
dynamic memory allocation. Precisely, the bottleneck is the call to malloc(...)
made from std::string::reserve
.
I've read that many existing JSON libraries such as rapidjson use custom allocators to avoid malloc(...)
calls during string memory allocations.
I tried to analyze rapidjson's source code but the large amount of additional code and comments, plus the fact that I'm not really sure what I'm looking for, didn't help me much.
std::strings
take available memory from it?Code notes:
Str
is an alias for std::string
.Strings Are Dynamically Allocated To implement this flexibility, strings are allocated dynamically. Dynamic allocation is expensive compared to most other C++ features, so no matter what, strings are going to show up as optimization hot spots.
While std::string has the size of 24 bytes, it allows strings up to 22 bytes(!!) with no allocation.
Allocators handle all the requests for allocation and deallocation of memory for a given container. The C++ Standard Library provides general-purpose allocators that are used by default, however, custom allocators may also be supplied by the programmer.
std::allocator is the default memory allocator for the standard library containers, and you can substitute your own allocators. This allows you to control how the standard containers allocate memory.
By default, std::string
allocates memory as needed from the same heap as anything that you allocate with malloc
or new
. To get a performance gain from providing your own custom allocator, you will need to be managing your own "chunk" of memory in such a way that your allocator can deal out the amounts of memory that your strings ask for faster than malloc
does. Your memory manager will make relatively few calls to malloc
, (or new
, depending on your approach) under the hood, requesting "large" amounts of memory at once, then deal out sections of this (these) memory block(s) through the custom allocator. To actually achieve better performance than malloc
, your memory manager will usually have to be tuned based on known allocation patterns of your use cases.
This kind of thing often comes down to the age-old trade off of memory use versus execution speed. For example: if you have a known upper bound on your string sizes in practice, you can pull tricks with over-allocating to always accommodate the largest case. While this is wasteful of your memory resources, it can alleviate the performance overhead that more generalized allocation runs into with memory fragmentation. As well as making any calls to realloc
essentially constant time for your purposes.
@sehe is exactly right. There are many ways.
EDIT:
To finally address your second question, strings using different allocators can play nicely together, and usage should be transparent.
For example:
class myalloc : public std::allocator<char>{};
myalloc customAllocator;
int main(void)
{
std::string mystring(customAllocator);
std::string regularString = "test string";
mystring = regularString;
std::cout << mystring;
return 0;
}
This is a fairly silly example and, of course, uses the same workhorse code under the hood. However, it shows assignment between strings using allocator classes of "different types". Implementing a useful allocator that supplies the full interface required by the STL without just disguising the default std::allocator
is not as trivial. This seems to be a decent write up covering the concepts involved. The key to why this works, in the context of your question at least, is that using different allocators doesn't cause the strings to be of different type. Notice that the custom allocator is given as an argument to the constructor not a template parameter. The STL still does fun things with templates (such as rebind
and Traits
) to homogenize allocator interfaces and tracking.
What often helps is the creation of a GlobalStringTable.
See if you can find portions of the old NiMain library from the now defunct NetImmerse software stack. It contains an example implementation.
Lifetime
What is important to note is that this string table needs to be accessible between different DLL spaces, and that it is not a static object. R. Martinho Fernandes already warned that the object needs to be created when the application or DLL thread is created / attached, and disposed when the thread is destroyed or the dll is detached, and preferrably before any string object is actually used. This sounds easier than it actually is.
Memory allocation
Once you have a single point of access that exports correctly, you can have it allocate a memory buffer up-front. If the memory is not enough, you have to resize it and move the existing strings over. Strings essentially become handles to regions of memory in this buffer.
Placement new
Something that often works well is called the placement new() operator, where you can actually specify where in memory your new string object needs to be allocated. However, instead of allocating, the operator can simply grab the memory location that is passed in as an argument, zero the memory at that location, and return it. You can also keep track of the allocation, the actual size of the string etc.. in the Globalstringtable object.
SOA
Handling the actual memory scheduling is something that is up to you, but there are many possible ways to approach this. Often, the allocated space is partitioned in several regions so that you have several blocks per possible string size. A block for strings <= 4 bytes, one for <= 8 bytes, and so on. This is called a Small Object Allocator, and can be implemented for any type and buffer.
If you expect many string operations where small strings are incremented repeatedly, you may change your strategy and allocate larger buffers from the start, so that the number of memmove operations are reduced. Or you can opt for a different approach and use string streams for those.
String operations
It is not a bad idea to derive from std::basic_str, so that most of the operations still work but the internal storage is actually in the GlobalStringTable, so that you can keep using the same stl conventions. This way, you also make sure that all the allocations are within a single DLL, so that there can be no heap corruption by linking different kinds of strings between different libraries, since all the allocation operations are essentially in your DLL (and are rerouted to the GlobalStringTable object)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With