When printing a u8
array in Rust using println!("{:?}", some_u8_slice);
this prints the numeric values (as it should).
What is the most direct way to format the characters as-is into the string without assuming any particular encoding?
Something like iterating over the byte string and writing each character to stdout
(without so much hassle).
Can this be done using Rust's format!
?
Otherwise what's the most convenient way to print a u8
slice?
Function core::str::from_utf81.0. 0 [−] [src] Converts a slice of bytes to a string slice. A string slice ( &str ) is made of bytes ( u8 ), and a byte slice ( &[u8] ) is made of bytes, so this function converts between the two. Not all byte slices are valid string slices, however: &str requires that it is valid UTF-8.
Rust's character and string types are designed around Unicode. String is not a sequence of ASCII chars, instead, it is a sequence of Unicode characters. A Rust char type is a 32-bit value holding a Unicode code.
If I can't assume a particular encoding, the way I normally do it is with the std::ascii::escape_default
function. Basically, it will show most ASCII characters as they are, and then escape everything else. The downside is that you won't see every possible Unicode codepoint even if portions of your string are correct UTF-8, but it does the job for most uses:
use std::ascii::escape_default;
use std::str;
fn show(bs: &[u8]) -> String {
let mut visible = String::new();
for &b in bs {
let part: Vec<u8> = escape_default(b).collect();
visible.push_str(str::from_utf8(&part).unwrap());
}
visible
}
fn main() {
let bytes = b"foo\xE2\x98\x83bar\xFFbaz";
println!("{}", show(bytes));
}
Output: foo\xe2\x98\x83bar\xffbaz
Another approach is to lossily decode the contents into a string and print that. If there's any invalid UTF-8, you'll get a Unicode replacement character instead of hex escapes of the raw bytes, but you will get to see all valid UTF-8 encoded Unicode codepoints:
fn show(bs: &[u8]) -> String {
String::from_utf8_lossy(bs).into_owned()
}
fn main() {
let bytes = b"foo\xE2\x98\x83bar\xFFbaz";
println!("{}", show(bytes));
}
Output: foo☃bar�baz
The simplest way is stdout().write_all(some_u8_slice)
. This will simply output the bytes, with no regard for their encoding. This is useful for binary data, or text in some unknown encoding where you want to preserve the original encoding.
If you want to treat the data as a string and you know that the encoding is UTF-8 (or a UTF-8 subset like ASCII) then you can do this:
use std::str;
fn main() {
let some_utf8_slice = &[104, 101, 0xFF, 108, 111];
if let Ok(s) = str::from_utf8(some_utf8_slice) {
println!("{}", s);
}
}
This will check that the data is valid UTF-8 before printing it.
If you just want to shovel the raw bytes unescaped to stdout, which can be especially useful when the output is redirected to a pipe or a file then following should do the job:
let mut out = std::io::stdout();
out.write_all(slice)?;
out.flush()?;
The flush
is necessary since write_all
immediately followed by a program abort can fail to deliver the bytes to the underlying file descriptor.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With