The expected approach of String.truncate(usize)
fails because it doesn't consider Unicode characters (which is baffling considering Rust treats strings as Unicode).
let mut s = "ボルテックス".to_string();
s.truncate(4);
thread '' panicked at 'assertion failed: self.is_char_boundary(new_len)'
Additionally, truncate
modifies the original string, which is not always desired.
The best I've come up with is to convert to char
s and collect into a String
.
fn truncate(s: String, max_width: usize) -> String {
s.chars().take(max_width).collect()
}
e.g.
fn main() {
assert_eq!(truncate("ボルテックス".to_string(), 0), "");
assert_eq!(truncate("ボルテックス".to_string(), 4), "ボルテッ");
assert_eq!(truncate("ボルテックス".to_string(), 100), "ボルテックス");
assert_eq!(truncate("hello".to_string(), 4), "hell");
}
However this feels very heavy handed.
Make sure you read and understand delnan's point:
Unicode is freaking complicated. Are you sure you want
char
(which corresponds to code points) as unit and not grapheme clusters?
The rest of this answer assumes you have a good reason for using char
and not graphemes.
which is baffling considering Rust treats strings as Unicode
This is not correct; Rust treats strings as UTF-8. In UTF-8, every code point is mapped to a variable number of bytes. There's no O(1)
algorithm to convert "6 characters" to "N bytes", so the standard library doesn't hide that from you.
You can use char_indices
to step through the string character by character and get the byte index of that character:
fn truncate(s: &str, max_chars: usize) -> &str {
match s.char_indices().nth(max_chars) {
None => s,
Some((idx, _)) => &s[..idx],
}
}
fn main() {
assert_eq!(truncate("ボルテックス", 0), "");
assert_eq!(truncate("ボルテックス", 4), "ボルテッ");
assert_eq!(truncate("ボルテックス", 100), "ボルテックス");
assert_eq!(truncate("hello", 4), "hell");
}
This also returns a slice that you can choose to move into a new allocation if you need to, or mutate a String
in place:
// May not be as efficient as inlining the code...
fn truncate_in_place(s: &mut String, max_chars: usize) {
let bytes = truncate(&s, max_chars).len();
s.truncate(bytes);
}
fn main() {
let mut s = "ボルテックス".to_string();
truncate_in_place(&mut s, 0);
assert_eq!(s, "");
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With