Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a Vec<u16> or Vec<WCHAR> to a &str

Tags:

string

rust

I'm getting into Rust programming to realize a small program and I'm a little bit lost in string conversions.

In my program, I have a vector as follows:

let mut name: Vec<winnt::WCHAR> = Vec::new(); 

WCHAR is the same as a u16 on my Windows machine.

I hand over the Vec<u16> to a C function (as a pointer) which fills it with data. I then need to convert the string contained in the vector into a &str. However, no matter, what I try, I can not manage to get this conversion working.

The only thing I managed to get working is to convert it to a WideString:

 widestr = unsafe { WideCString::from_ptr_str(name.as_ptr()) };

But this seems to be a step into the wrong direction.

What is the best way to convert the Vec<u16> to an &str under the assumption that the vector holds a valid and null-terminated string.

like image 290
Norbert Avatar asked Aug 21 '16 20:08

Norbert


2 Answers

I then need to convert the string contained in the vector into a &str. However, no matter, what I try, I can not manage to get this conversion working.

There's no way of making this a "free" conversion.

A &str is a Unicode string encoded with UTF-8. This is a byte-oriented encoding. If you have UTF-16 (or the different but common UCS-2 encoding), there's no way to read one as the other. That's equivalent to trying to read a JPEG image as a PDF. Both chunks of data might be a string, but the encoding is important.

The first question is "do you really need to do that?". Many times, you can take data from one function and shovel it back into another function, never looking at it. If you can get away with that, that might be be best answer.

If you do need to transform it, then you have to deal with the errors that can occur. An arbitrary array of 16-bit integers may not be valid UTF-16 or UCS-2. These encodings have edge cases that can easily produce invalid strings. Null-termination is another aspect - Unicode actually allows for embedded NUL characters, so a null-terminated string can't hold all possible Unicode characters!

Once you've ensured that the encoding is valid 1 and figured out how many entries in the input vector comprise the string, then you have to decode the input format and re-encode to the output format. This is likely to require some kind of new allocation, so you are most likely to end up with a String, which can then be used most anywhere a &str can be used.

There is a built-in method to convert UTF-16 data to a String: String::from_utf16. Note that it returns a Result to allow for these error cases. There's also String::from_utf16_lossy, which replaces invalid encoded parts with the Unicode replacement character.

let name = [0x68, 0x65, 0x6c, 0x6c, 0x6f]; 

let a = String::from_utf16(&name);
let b = String::from_utf16_lossy(&name);

println!("{:?}", a);
println!("{:?}", b);

If you are starting from a pointer to a u16 or WCHAR, you will need to convert to a slice first by using slice::from_raw_parts. If you have a null-terminated string, you need to find the NUL yourself and slice the input appropriately.


1: This is actually a great way of using types; a &str is guaranteed to be UTF-8 encoded, so no further check needs to be made. Similarly, the WideCString is likely to perform a check once upon construction and then can skip the check on later uses.

like image 186
Shepmaster Avatar answered Oct 18 '22 06:10

Shepmaster


This is my simple hack for this case. There must be a bug; fix for your own case:

let mut v = vec![0u16; MAX_PATH as usize];

// imaginary win32 function
win32_function(v.as_mut_ptr());

let mut path = String::new();
for val in v.iter() {
    let c: u8 = (*val & 0xFF) as u8;
    if c == 0 {
        break;
    } else {
        path.push(c as char);
    }
}
like image 2
sailfish009 Avatar answered Oct 18 '22 08:10

sailfish009