`std::string` allocations are my current bottleneck - how can I optimize with a custom allocator?

Tags:

I'm writing a C++14 JSON library as an exercise and to use it in my personal projects.

By using callgrind I've discovered that the current bottleneck during a continuous value creation from string stress test is an std::string dynamic memory allocation. Precisely, the bottleneck is the call to malloc(...) made from std::string::reserve.

I've read that many existing JSON libraries such as rapidjson use custom allocators to avoid malloc(...) calls during string memory allocations.

I tried to analyze rapidjson's source code but the large amount of additional code and comments, plus the fact that I'm not really sure what I'm looking for, didn't help me much.

How do custom allocators help in this situation?
- Is a memory buffer preallocated somewhere (where? statically?) and std::strings take available memory from it?
Are strings using custom allocators "compatible" with normal strings?
- They have different types. Do they have to be "converted"? (And does that result in a performance hit?)

Code notes:

Str is an alias for std::string.

592

asked Sep 30 '14 22:09

Vittorio Romeo

2 Answers

By default, std::string allocates memory as needed from the same heap as anything that you allocate with malloc or new. To get a performance gain from providing your own custom allocator, you will need to be managing your own "chunk" of memory in such a way that your allocator can deal out the amounts of memory that your strings ask for faster than malloc does. Your memory manager will make relatively few calls to malloc, (or new, depending on your approach) under the hood, requesting "large" amounts of memory at once, then deal out sections of this (these) memory block(s) through the custom allocator. To actually achieve better performance than malloc, your memory manager will usually have to be tuned based on known allocation patterns of your use cases.

This kind of thing often comes down to the age-old trade off of memory use versus execution speed. For example: if you have a known upper bound on your string sizes in practice, you can pull tricks with over-allocating to always accommodate the largest case. While this is wasteful of your memory resources, it can alleviate the performance overhead that more generalized allocation runs into with memory fragmentation. As well as making any calls to realloc essentially constant time for your purposes.

@sehe is exactly right. There are many ways.

EDIT:

To finally address your second question, strings using different allocators can play nicely together, and usage should be transparent.

For example:

Click to copy

class myalloc : public std::allocator<char>{};
myalloc customAllocator;

int main(void)
{
  std::string mystring(customAllocator);
  std::string regularString = "test string";
  mystring = regularString;
  std::cout << mystring;

  return 0;
}

This is a fairly silly example and, of course, uses the same workhorse code under the hood. However, it shows assignment between strings using allocator classes of "different types". Implementing a useful allocator that supplies the full interface required by the STL without just disguising the default std::allocator is not as trivial. This seems to be a decent write up covering the concepts involved. The key to why this works, in the context of your question at least, is that using different allocators doesn't cause the strings to be of different type. Notice that the custom allocator is given as an argument to the constructor not a template parameter. The STL still does fun things with templates (such as rebind and Traits) to homogenize allocator interfaces and tracking.

answered Oct 21 '22 19:10

iwolf

What often helps is the creation of a GlobalStringTable.

See if you can find portions of the old NiMain library from the now defunct NetImmerse software stack. It contains an example implementation.

Lifetime

What is important to note is that this string table needs to be accessible between different DLL spaces, and that it is not a static object. R. Martinho Fernandes already warned that the object needs to be created when the application or DLL thread is created / attached, and disposed when the thread is destroyed or the dll is detached, and preferrably before any string object is actually used. This sounds easier than it actually is.

Memory allocation

Once you have a single point of access that exports correctly, you can have it allocate a memory buffer up-front. If the memory is not enough, you have to resize it and move the existing strings over. Strings essentially become handles to regions of memory in this buffer.

Placement new

Something that often works well is called the placement new() operator, where you can actually specify where in memory your new string object needs to be allocated. However, instead of allocating, the operator can simply grab the memory location that is passed in as an argument, zero the memory at that location, and return it. You can also keep track of the allocation, the actual size of the string etc.. in the Globalstringtable object.

SOA

Handling the actual memory scheduling is something that is up to you, but there are many possible ways to approach this. Often, the allocated space is partitioned in several regions so that you have several blocks per possible string size. A block for strings <= 4 bytes, one for <= 8 bytes, and so on. This is called a Small Object Allocator, and can be implemented for any type and buffer.

If you expect many string operations where small strings are incremented repeatedly, you may change your strategy and allocate larger buffers from the start, so that the number of memmove operations are reduced. Or you can opt for a different approach and use string streams for those.

String operations

It is not a bad idea to derive from std::basic_str, so that most of the operations still work but the internal storage is actually in the GlobalStringTable, so that you can keep using the same stl conventions. This way, you also make sure that all the allocations are within a single DLL, so that there can be no heap corruption by linking different kinds of strings between different libraries, since all the allocation operations are essentially in your DLL (and are rerouted to the GlobalStringTable object)

answered Oct 21 '22 19:10

StarShine

Related questions
                            
                                Square detection doesn't find squares
                            
                                selecting a single member from each structure in a std::vector into another vector
                            
                                C++11 std::condition_variable: can we pass our lock directly to the notified thread?
                            
                                Simulate network conditions with a C/C++ Socket
                            
                                Should I prefer std::thread or Boost threads?
                            
                                Does std::map require the comparator's operator() to be const?
                            
                                How to make Qt GUI apps in C++ without memory leaks
                            
                                How to prevent specialization of std::vector<bool>
                            
                                Placement new and uninitialized POD members
                            
                                Template specialization and derived classes in C++
                            
                                Is a naked char32_t signed or unsigned?
                            
                                Member of base class template is out of scope in derived class template with same template argument
                            
                                Print the whole linked list in gdb?
                            
                                What is the range of gl_FragCoord
                            
                                Add elements to a vector during range-based loop c++11
                            
                                Visual Studio 2012 - error LNK1104: cannot open file 'glew32.lib'
                            
                                Are templates + functors/lambdas suboptimal in terms of memory usage?
                            
                                Using SFINAE to select function based on whether a particular overload of a function exists [duplicate]
                            
                                When using bind(), arguments are passed by reference even if ref() adaptor is not used
                            
                                What is copy/move constructor choosing rule in C++? When does move-to-copy fallback happen?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

`std::string` allocations are my current bottleneck - how can I optimize with a custom allocator?

Tags:

c++

memory-management

string

optimization

c++14

Vittorio Romeo

People also ask

2 Answers

iwolf

StarShine

Recent Activity

Donate For Us