Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does str primarily exist in it's borrowed form? [duplicate]

This is how the str type is used:

let hello = "Hello, world!";

// with an explicit type annotation
let hello: &'static str = "Hello, world!";

let hello: str = "Hello, world!"; leads to expected `str`, found `&str`

Why is the default type of the text not just str unlike all primitive types, vectors, and String? Why is it a reference?

like image 442
QurakNerd Avatar asked Apr 12 '20 16:04

QurakNerd


People also ask

Why is str borrowed in Rust?

Since str is a locally created value it cannot outlive the function. The convenience that a string literal is always a reference to a static string resolves this problem. This means that you will have to write additional code to put your owned str into a static variable, such that you could return &str .

Why does rust have two string types?

Rust doesn't sugar coat a lot of the ugliness and complexity of string handling from developers like other languages do and therefore helps in avoiding critical mistakes in the future. By construction, both string types are valid UTF-8. This ensures there are no misbehaving strings in a program.

What is the difference between STR and string?

String is the dynamic heap string type, like Vec : use it when you need to own or modify your string data. str is an immutable1 sequence of UTF-8 bytes of dynamic length somewhere in memory. Since the size is unknown, one can only handle it behind a pointer.

What is str in Rust?

The str type, also called a 'string slice', is the most primitive string type. It is usually seen in its borrowed form, &str . It is also the type of string literals, &'static str . String slices are always valid UTF-8.


2 Answers

The design decision that strings and slices are only accessible via references has many advantages:

  1. strings can have any length. So a variable of type str is not easily managed on the stack, while &str has just the size of a pointer on the stack (while the variable length data resides on the heap). Note that all other primitive types have a fixed length, every reference has a fixed length (not the data it is pointing to) and every struct (which is a composition).
  2. &str is an immutable reference. If you could define variables of type str you have to give semantics to let mut s: str = "str";. An immutable string on the stack is hard to manage, a string which could be appended is even harder.
  3. Owned str mean that every move would have to copy all chars, which costs performance. Just copying the reference and keeping the referenced data constant on the heap is cheaper. This is not really a zero-cost abstraction.
  4. str is not the only type that appears only as reference &str (same holds for slices, like &[i8]) so a change to the handling of strings would make other behavior odd (or it has to be changed accordingly).
  5. Let us assume that a function could manage variables of type str. Now you want to return a &str from this function. This cannot work because a reference lives at most as long as the value it points to (try this with any primitive type). Since str is a locally created value it cannot outlive the function. The convenience that a string literal is always a reference to a static string resolves this problem. This means that you will have to write additional code to put your owned str into a static variable, such that you could return &str. And since a static reference is the default behavior I need, it is quite convenient that I could write it with small overhead.
like image 189
CoronA Avatar answered Oct 16 '22 01:10

CoronA


I will try to give a different perspective. In Rust there is a general convention: if you have a variable of some type T, it means that you own the data associated with T. If you have a variable of type &T, then you don't own the data.

Now let's consider a heap-allocated string. According to this convention, there should be a non-reference type that represents ownership of the allocation. And indeed such a type exists: String.

There is also a different kind of strings: &'static str. These strings are not owned by anyone: exactly one instance of string is placed inside the compiled binary file, and only pointers are passed around. There is no allocation and no deallocation, hence no ownership. In a sense, static strings are owned by the compiler, not by a programmer. This is why String can not be used to represent a static string.

Alright, so why not use &String to represent a static string? Imagine a world where the following code is a valid Rust:

let s: &'static String = "hello, world!";

This might look fine, but implementation-wise, this is suboptimal:

  1. String itself has a pointer to the actual data, so &String has to be basically a pointer to a pointer. This violates zero-cost abstraction principle: why do we introduce an excessive level of indirection, when actually the compiler statically knows the address of "hello, world!"?
  2. Even if somehow the compiler was smart enough to decide that an excessive pointer is not needed here (which would lead to a bunch of other problems), still String itself contains three 8-byte fields:

    • Data pointer;
    • Data length;
    • Allocation capacity - lets us know how much free space there is after the data.

    However, when we are talking about static strings, capacity makes zero sense: static strings are read-only.

So, in the end, when the compiler sees &'static String, we actually want it to store only a data pointer and length - otherwise, we are paying for what we will never use, which is against zero-cost abstraction principle. This looks like an arcane wizardry that we want from the compiler: the variable type is &String but the variable itself is anything but a reference to String.

To make this work, we actually need a different type, not &String, that only holds a data pointer and length. And here it is: &str! It is better than &String in a number of ways:

  1. Does not have an excessive level of indirection - only one pointer;
  2. Does not store capacity, which would be meaningless in many contexts;
  3. No black magic: we define str as a variable-sized type (the data itself), so &str is just a reference to the data.

Now you might wonder: why not introduce str instead of &str? Remeber the convention: having str would imply that you own the data, which you don't. Hence &str.

like image 26
kreo Avatar answered Oct 16 '22 03:10

kreo