Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do Rust Strings have no Short String Optimizations (SSOs)?

Tags:

string

rust

I've looked into the source code for String, and found that it's implemented in terms of Vec, which has no form of small object optimizations:

pub struct String {
    vec: Vec<u8>,
}

Coming from C++, where every major standard library uses Short String Optimizations (SSOs) for std::string, this is very surprising. A lot of use cases of strings involve very short strings, such as:

  1. if you're writing a compiler, you'll have strings for keywords and tokens, like "==" , "pub", "delete"
  2. if you're stringifying an enum, all of the constant names are typically short enough to fit into an SSO buffer
  3. if you're printing stuff using format strings, the format strings are very rarely so long that they wouldn't fit into SSOs
  4. if you're parsing a config file with keys and values, both the keys and values are very short typically, like setting: enabled
  5. if you're storing regular expressions, those can often fit into SSOs too
  6. if you're storing a dictionary, pretty much the entirety of it can be SSO'd, because e.g. English words are fairly short

Given this, what is the rationale for not using any SSOs for the default String? Would it be possible to add that feature retroactively? Is there any profiling data to demonstrate whether SSOs are helpful or not?

Notes on SSOs in C++

SSOs are done by reusing the memory of the std::string container that would otherwise store the pointer, size, and capacity to store:

  • the size of the internal string (can be just one byte)
  • the string data in the container (usually ~20 bytes max length)

It's also possible that only the capacity is reused, and there is a pointer to inside the string object. All of this is usually done through a union, and would be possible in Rust as well.

like image 789
Jan Schultke Avatar asked Jun 28 '26 20:06

Jan Schultke


1 Answers

SSO is not always a win - it optimizes short strings at the expense of long strings. Rust prefers to have consistent performance characteristics, especially in the standard library, and let external crates handle the other cases. This way, users of the standard library are never pessimized, and if needed, they can use external crates and still enjoy the optimizations for their particular use-case.

Moreover, SSO in Rust is potentially more expensive than SSO in C++: although I don't know if standard libraries in C++ actually use this ability, C++ has move constructors - and therefore, can have a pointer to the data whether it is stored in the heap or the stack. This way, accessing the data is branch-free. However, Rust cannot do that, because when the object is moved the pointer to the inline storage would have to be updated - but in Rust moves are always simple memcpys.

Moreover, while Rust could define String to have SSO, now this is impossible as it guarantees the buffer will always be stored on the heap, and also, String::into_bytes() (that returns Vec<u8>) guarantees to not copy the data.

like image 190
Chayim Friedman Avatar answered Jul 01 '26 09:07

Chayim Friedman