Based on the Rust book, the String::len
method returns the number of bytes composing the string, which may not correspond to the length in characters.
For example if we consider the following string in Japanese, len()
would return 30, which is the number of bytes and not the number of characters, which would be 10:
let s = String::from("ラウトは難しいです!"); s.len() // returns 30.
The only way I have found to get the number of characters is using the following function:
s.chars().count()
which returns 10, and is the correct number of characters.
Is there any method on String
that returns the characters count, aside from the one I am using above?
Summary. We use the len() function to get the byte count of a string in Rust. This tells us the number of bytes, which is the same as the string length for ASCII strings.
User can get the length of the string using strlen() function declared under the header file string. h . Size of the string will be counted with white spaces. this function counts and returns the number of characters in a string.
To convert a string to an integer in Rust, use parse() function. The parse function needs to know what type, which can be specified on the left-side of assignment like so: let str = "123"; let num: i32 = str. parse().
Is there any method on
String
that returns the characters count, aside from the one I am using above?
No. Using s.chars().count()
is correct. Note that this is an O(N) operation (because UTF-8 is complex) while getting the number of bytes is an O(1) operation.
You can see all the methods on str
for yourself.
As pointed out in the comments, a char
is a specific concept:
It's important to remember that
char
represents a Unicode Scalar Value, and may not match your idea of what a 'character' is. Iteration over grapheme clusters may be what you actually want.
One such example is with precomposed characters:
fn main() { println!("{}", "é".chars().count()); // 2 println!("{}", "é".chars().count()); // 1 }
You may prefer to use graphemes
from the unicode-segmentation crate instead:
use unicode_segmentation::UnicodeSegmentation; // 1.6.0 fn main() { println!("{}", "é".graphemes(true).count()); // 1 println!("{}", "é".graphemes(true).count()); // 1 }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With