Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory-efficient C++ strings (interning, ropes, copy-on-write, etc) [closed]

My application is having memory problems, including copying lots of strings about, using the same strings as keys in lots of hashtables, etc. I'm looking for a base class for my strings that makes this very efficient.

I'm hoping for:

  • String interning (multiple strings of the same value use the same memory),
  • copy-on-write (I think this comes for free in nearly all std::string implementations),
  • something with ropes would be a bonus (for O(1)-ish concatenation).

My platform is g++ on Linux (but that is unlikely to matter).

Do you know of such a library?

like image 690
Paul Biggar Avatar asked Jul 12 '09 13:07

Paul Biggar


People also ask

What is an interned string object used for?

In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned.

How are C++ strings stored in memory?

In C++, an extra byte is appended to the end of string literals when they are stored in memory. In this last byte, the number 0 is stored. It is called the null terminator or null characters, and it marks the end of the string.

Does string allocate memory?

If your program needs to create a string of varying lengths then you'll have to allocate the memory yourself using malloc. In duplicating a string, s, for example we would need to find the length of that string: int len = strlen(s);


4 Answers

copy-on-write (I think this comes for free in nearly all std::string implementations)

I don't believe this is the case any longer. Copy-on-write causes problems when you modify the strings through iterators: in particular, this either causes unwanted results (i.e. no copy, and both strings get modified) or an unnecessary overhead (since the iterators cannot be implemented purely in terms of pointers: they need to perform additional checks when being dereferenced).

Additionally, all modern C++ compilers perform NRVO and eliminate the need for copying return value strings in most cases. Since this has been one of the most common cases for copy-on-write semantics, it has been removed due to the aforementioned downsides.

like image 118
Konrad Rudolph Avatar answered Oct 04 '22 10:10

Konrad Rudolph


If most of your strings are immutable, the Boost Flyweight library might suit your needs.

It will do the string interning, but I don't believe it does copy-on-write.

like image 20
Ferruccio Avatar answered Oct 04 '22 09:10

Ferruccio


Andrei Alexandrescu's 'Policy Based basic_string implementation' may help.

like image 41
graham.reeds Avatar answered Oct 04 '22 09:10

graham.reeds


Take a look at The Better String Library from legendary Paul Hsieh

like image 21
Indy9000 Avatar answered Oct 04 '22 10:10

Indy9000