Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to print a u8 slice as text if I don't care about the particular encoding?

When printing a u8 array in Rust using println!("{:?}", some_u8_slice); this prints the numeric values (as it should).

What is the most direct way to format the characters as-is into the string without assuming any particular encoding?

Something like iterating over the byte string and writing each character to stdout (without so much hassle).

Can this be done using Rust's format!?

Otherwise what's the most convenient way to print a u8 slice?

like image 463
ideasman42 Avatar asked Jan 03 '17 18:01

ideasman42


People also ask

How do you convert a string from u8 to rust?

Function core::str::from_utf81.0. 0 [−] [src] Converts a slice of bytes to a string slice. A string slice ( &str ) is made of bytes ( u8 ), and a byte slice ( &[u8] ) is made of bytes, so this function converts between the two. Not all byte slices are valid string slices, however: &str requires that it is valid UTF-8.

Are Rust strings Unicode?

Rust's character and string types are designed around Unicode. String is not a sequence of ASCII chars, instead, it is a sequence of Unicode characters. A Rust char type is a 32-bit value holding a Unicode code.


3 Answers

If I can't assume a particular encoding, the way I normally do it is with the std::ascii::escape_default function. Basically, it will show most ASCII characters as they are, and then escape everything else. The downside is that you won't see every possible Unicode codepoint even if portions of your string are correct UTF-8, but it does the job for most uses:

use std::ascii::escape_default;
use std::str;

fn show(bs: &[u8]) -> String {
    let mut visible = String::new();
    for &b in bs {
        let part: Vec<u8> = escape_default(b).collect();
        visible.push_str(str::from_utf8(&part).unwrap());
    }
    visible
}

fn main() {
    let bytes = b"foo\xE2\x98\x83bar\xFFbaz";
    println!("{}", show(bytes));
}

Output: foo\xe2\x98\x83bar\xffbaz

Another approach is to lossily decode the contents into a string and print that. If there's any invalid UTF-8, you'll get a Unicode replacement character instead of hex escapes of the raw bytes, but you will get to see all valid UTF-8 encoded Unicode codepoints:

fn show(bs: &[u8]) -> String {
    String::from_utf8_lossy(bs).into_owned()
}

fn main() {
    let bytes = b"foo\xE2\x98\x83bar\xFFbaz";
    println!("{}", show(bytes));
}

Output: foo☃bar�baz

like image 105
BurntSushi5 Avatar answered Nov 12 '22 05:11

BurntSushi5


The simplest way is stdout().write_all(some_u8_slice). This will simply output the bytes, with no regard for their encoding. This is useful for binary data, or text in some unknown encoding where you want to preserve the original encoding.

If you want to treat the data as a string and you know that the encoding is UTF-8 (or a UTF-8 subset like ASCII) then you can do this:

use std::str;

fn main() {
    let some_utf8_slice = &[104, 101, 0xFF, 108, 111];
    if let Ok(s) = str::from_utf8(some_utf8_slice) {
        println!("{}", s);
    }
}

This will check that the data is valid UTF-8 before printing it.

like image 22
mbrubeck Avatar answered Nov 12 '22 05:11

mbrubeck


If you just want to shovel the raw bytes unescaped to stdout, which can be especially useful when the output is redirected to a pipe or a file then following should do the job:

let mut out = std::io::stdout();
out.write_all(slice)?;
out.flush()?;

The flush is necessary since write_all immediately followed by a program abort can fail to deliver the bytes to the underlying file descriptor.

like image 44
the8472 Avatar answered Nov 12 '22 07:11

the8472