This code shows that <code>char</code> takes 4 bytes: <pre class="prettyprint"><code>println!("char : {}", std::mem::size_of::<char>()); </code></pre> <ol> <li>Why does it take 4 bytes?. </li> <li>Does the size depend on the platform, or is it always 4 bytes?</li> <li>If it's always 4 bytes, it is for something special?</li> <li>Does the compiler guarantee some minimum size for the size of <code>char</code>?</li> </ol> In https://play.rust-lang.org/ I also get 4 bytes

First of all: a <code>char</code> in Rust is a unique integral value representing a Unicode Scalar value. For example, consider 💩 (aka Pile of Poo, aka U+1F4A9), in Rust it will be represented by a <code>char</code> with a value of <code>128169</code> in decimal (that is <code>0x1F4A9</code> in hexadecimal): <pre class="prettyprint"><code>fn main() { let c: char = "💩".chars().next().unwrap(); println!("💩 is {} ({})", c, c as u32); } </code></pre> On the playpen. With that said, the Rust <code>char</code> is 4 bytes because 4 bytes is the smallest power of 2 number of bytes which can hold the integral value of any Unicode Scalar value. The decision was driven by the domain, not by architectural constraints. <hr> Note: the emphasis on Scalar value is that a number of "characters" as we see them are actually graphemes composed by multiple combining characters in Unicode, in this case multiple <code>char</code> are required.

<code>char</code> is four bytes. It is always four bytes, it will always be four bytes. Four bytes it be, and four bytes shall it remain. It's not for anything special; four bytes is simply the smallest power of two in which you can store any Unicode scalar value. Various other languages do the same thing.

Why is the size of `char` 4 bytes in Rust?

Tags:

rust

This code shows that char takes 4 bytes:

println!("char : {}", std::mem::size_of::<char>());

Why does it take 4 bytes?.
Does the size depend on the platform, or is it always 4 bytes?
If it's always 4 bytes, it is for something special?
Does the compiler guarantee some minimum size for the size of char?

In https://play.rust-lang.org/ I also get 4 bytes

849

asked Apr 03 '16 02:04

Angel Angel

3 Answers

First of all: a char in Rust is a unique integral value representing a Unicode Scalar value. For example, consider 💩 (aka Pile of Poo, aka U+1F4A9), in Rust it will be represented by a char with a value of 128169 in decimal (that is 0x1F4A9 in hexadecimal):

fn main() {
    let c: char = "💩".chars().next().unwrap();
    println!("💩 is {} ({})", c, c as u32);
}

On the playpen.

With that said, the Rust char is 4 bytes because 4 bytes is the smallest power of 2 number of bytes which can hold the integral value of any Unicode Scalar value. The decision was driven by the domain, not by architectural constraints.

Note: the emphasis on Scalar value is that a number of "characters" as we see them are actually graphemes composed by multiple combining characters in Unicode, in this case multiple char are required.

148

answered Oct 21 '22 18:10

Matthieu M.

char is four bytes. It is always four bytes, it will always be four bytes. Four bytes it be, and four bytes shall it remain.

It's not for anything special; four bytes is simply the smallest power of two in which you can store any Unicode scalar value. Various other languages do the same thing.

answered Oct 21 '22 17:10

DK.

Char is four bytes, it doesn't depend on the architecture.

Why? According to UTF-8 Wikipedia's article.

The first 128 characters (US-ASCII) need one byte. The next 1,920 characters need two bytes to encode. Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use. Four bytes are needed for characters in the other planes of Unicode.

So if you want to represent any possible Unicode character the compiler must save 4 bytes.

You should also consider Byte Alignment: http://www.eventhelix.com/realtimemantra/ByteAlignmentAndOrdering.htm

answered Oct 21 '22 18:10

Fylux

Related questions
                            
                                How to wrap a raw string literal without inserting newlines into the raw string?
                            
                                Sorting a vector of tuples needs a reference for the second value?
                            
                                Rust macro accepting type with generic parameters
                            
                                Why doesn't the Rust optimizer remove those useless instructions (tested on Godbolt Compiler Explorer)?
                            
                                How do I go from a NaiveDate to a specific TimeZone with Chrono?
                            
                                What is the correct way to read a binary file in chunks of a fixed size and store all of those chunks into a Vec?
                            
                                Iterate over std::fs::ReadDir and get only filenames from paths
                            
                                How to sort ReadDir iterator
                            
                                Access nested structures without moving
                            
                                How to check if there are duplicates in a slice?
                            
                                How can I approximate method overloading?
                            
                                Why is a borrow still held in the else block of an if let?
                            
                                Iterating over a vector of mutable references to trait objects
                            
                                Does this error message mean I can use pattern matching in for loops?
                            
                                How to accept an async function as an argument?
                            
                                Creating a vector with non-constant length
                            
                                Reasons for Dot Notation for Tuple
                            
                                What does :: mean in Rust?
                            
                                Cannot pass closure as parameter [duplicate]
                            
                                How to use the same iterator twice, once for counting and once for iteration?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With