This is how the str
type is used:
let hello = "Hello, world!";
// with an explicit type annotation
let hello: &'static str = "Hello, world!";
let hello: str = "Hello, world!";
leads to expected `str`, found `&str`
Why is the default type of the text not just str
unlike all primitive types, vectors, and String
? Why is it a reference?
Since str is a locally created value it cannot outlive the function. The convenience that a string literal is always a reference to a static string resolves this problem. This means that you will have to write additional code to put your owned str into a static variable, such that you could return &str .
Rust doesn't sugar coat a lot of the ugliness and complexity of string handling from developers like other languages do and therefore helps in avoiding critical mistakes in the future. By construction, both string types are valid UTF-8. This ensures there are no misbehaving strings in a program.
String is the dynamic heap string type, like Vec : use it when you need to own or modify your string data. str is an immutable1 sequence of UTF-8 bytes of dynamic length somewhere in memory. Since the size is unknown, one can only handle it behind a pointer.
The str type, also called a 'string slice', is the most primitive string type. It is usually seen in its borrowed form, &str . It is also the type of string literals, &'static str . String slices are always valid UTF-8.
The design decision that strings and slices are only accessible via references has many advantages:
str
is not easily managed on the stack, while &str
has just the size of a pointer on the stack (while the variable length data resides on the heap). Note that all other primitive types have a fixed length, every reference has a fixed length (not the data it is pointing to) and every struct (which is a composition).&str
is an immutable reference. If you could define variables of type str
you have to give semantics to let mut s: str = "str";
. An immutable string on the stack is hard to manage, a string which could be appended is even harder.str
mean that every move would have to copy all chars, which costs performance. Just copying the reference and keeping the referenced data constant on the heap is cheaper. This is not really a zero-cost abstraction.str
is not the only type that appears only as reference &str
(same holds for slices, like &[i8]
) so a change to the handling of strings would make other behavior odd (or it has to be changed accordingly).str
. Now you want to return a &str
from this function. This cannot work because a reference lives at most as long as the value it points to (try this with any primitive type). Since str
is a locally created value it cannot outlive the function. The convenience that a string literal is always a reference to a static string resolves this problem. This means that you will have to write additional code to put your owned str
into a static variable, such that you could return &str
. And since a static reference is the default behavior I need, it is quite convenient that I could write it with small overhead.I will try to give a different perspective. In Rust there is a general convention: if you have a variable of some type T
, it means that you own the data associated with T
. If you have a variable of type &T
, then you don't own the data.
Now let's consider a heap-allocated string. According to this convention, there should be a non-reference type that represents ownership of the allocation. And indeed such a type exists: String
.
There is also a different kind of strings: &'static str
. These strings are not owned by anyone: exactly one instance of string is placed inside the compiled binary file, and only pointers are passed around. There is no allocation and no deallocation, hence no ownership. In a sense, static strings are owned by the compiler, not by a programmer. This is why String
can not be used to represent a static string.
Alright, so why not use &String
to represent a static string? Imagine a world where the following code is a valid Rust:
let s: &'static String = "hello, world!";
This might look fine, but implementation-wise, this is suboptimal:
String
itself has a pointer to the actual data, so &String
has to be basically a pointer to a pointer. This violates zero-cost abstraction principle: why do we introduce an excessive level of indirection, when actually the compiler statically knows the address of "hello, world!"
?Even if somehow the compiler was smart enough to decide that an excessive pointer is not needed here (which would lead to a bunch of other problems), still String
itself contains three 8-byte fields:
However, when we are talking about static strings, capacity makes zero sense: static strings are read-only.
So, in the end, when the compiler sees &'static String
, we actually want it to store only a data pointer and length - otherwise, we are paying for what we will never use, which is against zero-cost abstraction principle. This looks like an arcane wizardry that we want from the compiler: the variable type is &String
but the variable itself is anything but a reference to String
.
To make this work, we actually need a different type, not &String
, that only holds a data pointer and length. And here it is: &str
! It is better than &String
in a number of ways:
str
as a variable-sized type (the data itself), so &str
is just a reference to the data.Now you might wonder: why not introduce str
instead of &str
? Remeber the convention: having str
would imply that you own the data, which you don't. Hence &str
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With