Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ - String capacity pattern

I noticed that the string capacities in C++ follow this pattern:

  • initial string size is 15
  • for any string that is larger than a particular size 'block', the capacity is doubled.

Here are the string capacities for strings up to length 500:

15
30
60
120
240
480
960

Capacities were found with the following C++ program:

#include <iostream>
#include <vector>

using namespace std;

string getstr(int len) { 
    string s = "";
    for (int i=0; i<len; i++) {
        s.append("1");
    }
    return s;
}

int main() {
    vector<int> capacities;
    int prevcap;
    for (int i=0; i<500; i++) {
        int cap = getstr(i).capacity();
        if (cap > prevcap) {
            capacities.push_back(cap);
            prevcap = cap;
        }
    }
    for (int i : capacities) {
        cout << i << endl;
    }
}

What is the logic behind choosing this algorithm? Do the numbers (here 15 and 2) have any significance, or have they been randomly chosen? Also, does this algorithm vary from compiler to compiler? (This was compiled and tested with g++ 5.4.0 on Ubuntu 16.04) Any insights are appreciated.

like image 390
Aniruddha Deb Avatar asked Sep 29 '20 05:09

Aniruddha Deb


People also ask

What is capacity of string in C?

The capacity of a string reflects how much memory the string allocated to hold its contents. This value is measured in string characters, excluding the NULL terminator. For example, a string with capacity 8 could hold 8 characters. size_type string::capacity() const.

Why is size 32 string?

So this string implementation is 32 because that's the way it was built in this implementation and it will by 16 in other implementations and 64 in yet another. The size of the string will (like water) depend on the environment it is used in.

How many characters can a string fit?

Therefore, the maximum length of String in Java is 0 to 2147483647. So, we can have a String with the length of 2,147,483,647 characters, theoretically.

Are strings dynamically allocated in C++?

Strings Are Dynamically Allocated act on fixed-size character arrays. To implement this flexibility, strings are allocated dynamically. Dynamic allocation is expensive compared to most other C++ features, so no matter what, strings are going to show up as optimization hot spots.


1 Answers

Doubling is a well known method. It amortizes the cost of reallocation making push_back a constant time operation (as it is required to be). The 'obvious' alternative of adding a fixed size would make push_back a linear time operation. Other patterns are possible though, any multiplicative increase would work theoretically, and I once read an article advocating that each increased capacity should be taken from the next term in a Fibonacci sequence.

I imagine the initial size of 15 is chosen with short string optimization (SSO) in mind. With SSO the string data is stored in the string object itself instead of in separately allocated memory. I imagine 15 is the largest short string that can be accommodated in this particular implementation. Knowing what sizeof(std::string) is might shed some light on this.

like image 96
john Avatar answered Oct 20 '22 16:10

john