Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is there is no dedicated method for creating a String from a UTF8-encoded array?

Tags:

rust

I need to construct String from array of bytes (not Vec). This works:

let buf2 = [30, 40, 50];
let string2 = std::str::from_utf8(&buf2).unwrap().to_string();
  1. Why is there is no dedicated method for array/slice in String?
  2. Why is the parameter of from_utf8 not a generic?
  3. Is the snippet above idiomatic Rust?

I ended up not needing the String and going with &str, but the questions remain.

like image 636
nicolai Avatar asked Dec 10 '22 23:12

nicolai


1 Answers

There are two from_utf8 methods. One goes from &[u8] to str, the other from Vec<u8>String. Why two? What's the difference? And why isn't there one to go straight from &[u8] to String?

Cheap conversions

Let's consult the official Rust docs.

str::from_utf8(v: &[u8]) -> Result<&str, Utf8Error>

A string slice (&str) is made of bytes (u8), and a byte slice (&[u8]) is made of bytes, so this function converts between the two. Not all byte slices are valid string slices, however: &str requires that it is valid UTF-8. from_utf8() checks to ensure that the bytes are valid UTF-8, and then does the conversion.

Source

If a &[u8] byte slice contains valid UTF-8 data, a &str string slice can be created by simply using the bytes as the string data. It's a very cheap operation, no allocation required.

String::from_utf8(vec: Vec<u8>) -> Result<String, FromUtf8Error>

Converts a vector of bytes to a String. ... This method will take care to not copy the vector, for efficiency’s sake.

Source

The same thing goes for String's method. A String is an owned type: it needs to own the underlying bytes, not just point at someone else's bytes. If it were to take a &[u8] it would have to allocate memory. However, if you already have an owned Vec<u8> then converting from Vec<u8> to String is a cheap operation. String can consume the Vec<u8> and reuse its existing heap buffer. No allocation required.

Explicit heap allocation and copying

Rust wants you to pay attention to memory allocation and copying. Only cheap conversion methods are provided. Any allocation or copying requires an extra method call. It's elegant. The fast path is convenient, the slow path cumbersome. You either need to:

  1. Convert your &[u8] to a &str (cheap) and then convert that to an owned String (expensive); or
  2. Convert your &[u8] to an owned Vec<u8> (expensive) and then convert that to a String (cheap).

Either way, it's your choice, and it requires a second method call.

like image 78
John Kugelman Avatar answered Feb 23 '23 00:02

John Kugelman