Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the purpose of `b` here?

Tags:

syntax

rust

In this code:

fn main() {
    use std::collections::hash_map::DefaultHasher;
    use std::hash::Hasher;
    
    let mut hasher = DefaultHasher::new();
    
    hasher.write_u32(1989);
    hasher.write_u8(11);
    hasher.write_u8(9);
    hasher.write(b"Huh?"); // <--------
    
    println!("Hash is {:x}!", hasher.finish());
}

What's the point of b? What does it do?

like image 768
ElementalX Avatar asked Dec 12 '25 15:12

ElementalX


2 Answers

Any string prefixed by a b tells the compiler that the string should be treated as a byte sequence. This is called a byte string literal.

You can read more about it in the The Rust Reference. In short, a string in Rust is a valid sequence of unicode characters and hence it can be represented as &[u8] (A slice containing unsigned 8-bit integers). A byte is also a 8 bit-integer so it is considered as a sequence of unicode bytes.

The hasher.write(...) function takes a &[u8], basically a sequence of bytes as parameter. In order to convert your &str to bytes, you prefix it with a b

like image 105
Arijit Dey Avatar answered Dec 15 '25 13:12

Arijit Dey


"some string":

  • is a &str which can be converted easily to &[u8] as UTF-8
  • must result in a sequence of bytes which is valid UTF-8
  • can contain any ASCII or Unicode character
  • can use ASCII escape sequences (\x00 to \x7F) and Unicode escape sequences (\u0 to \u10FFFF)

b"some string":

  • is a &[u8; N] (*) which must be validated as UTF-8 to safely convert to &str
  • may result in any arbitrary sequence of bytes
  • can contain only ASCII characters
  • can use full byte escape sequences (\x00 to \xFF)

For example, b"\xFF" is valid while "\xFF" is not, because the byte FF in hex (255 in decimal) is not allowed anywhere in UTF-8. Similarly, "😊" is valid while b"😊" is not, because emoji are part of Unicode and not ASCII.

As an interesting side note, "\uFF" (which is the same as "ΓΏ") does not convert to b"\xFF" but rather to b"\xC3\xBF" because UTF-8 uses multiple bytes to encode characters outside of ASCII.

(*) = &[u8; N] (array) is similar to &[u8] (slice), but it also encodes the length N as part of the type (e.g. N is 11 for b"some string"). The distinction often doesn't matter, as the former coerces to the latter. More details on the distinction can be found in an answer to a different question.

like image 27
kbolino Avatar answered Dec 15 '25 12:12

kbolino



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!