Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a Vec<u32> to Vec<u8> in-place and with minimal overhead

I'm trying to convert a Vec of u32s to a Vec of u8s, preferably in-place and without too much overhead.

My current solution relies on unsafe code to re-construct the Vec. Is there a better way to do this, and what are the risks associated with my solution?

use std::mem;
use std::vec::Vec;

fn main() {
    let mut vec32 = vec![1u32, 2];
    let vec8;
    unsafe {
        let length = vec32.len() * 4; // size of u8 = 4 * size of u32
        let capacity = vec32.capacity() * 4; // ^
        let mutptr = vec32.as_mut_ptr() as *mut u8;
        mem::forget(vec32); // don't run the destructor for vec32

        // construct new vec
        vec8 = Vec::from_raw_parts(mutptr, length, capacity);
    }

    println!("{:?}", vec8)
}

Rust Playground link

like image 751
Thom Wiggers Avatar asked Apr 06 '18 10:04

Thom Wiggers


Video Answer


2 Answers

  1. Whenever writing an unsafe block, I strongly encourage people to include a comment on the block explaining why you think the code is actually safe. That type of information is useful for the people who read the code in the future.

  2. Instead of adding comments about the "magic number" 4, just use mem::size_of::<u32>. I'd even go so far as to use size_of for u8 and perform the division for maximum clarity.

  3. You can return the newly-created Vec from the unsafe block.

  4. As mentioned in the comments, "dumping" a block of data like this makes the data format platform dependent; you will get different answers on little endian and big endian systems. This can lead to massive debugging headaches in the future. File formats either encode the platform endianness into the file (making the reader's job harder) or only write a specific endinanness to the file (making the writer's job harder).

  5. I'd probably move the whole unsafe block to a function and give it a name, just for organization purposes.

  6. You don't need to import Vec, it's in the prelude.

use std::mem;

fn main() {
    let mut vec32 = vec![1u32, 2];

    // I copy-pasted this code from StackOverflow without reading the answer 
    // surrounding it that told me to write a comment explaining why this code 
    // is actually safe for my own use case.
    let vec8 = unsafe {
        let ratio = mem::size_of::<u32>() / mem::size_of::<u8>();

        let length = vec32.len() * ratio;
        let capacity = vec32.capacity() * ratio;
        let ptr = vec32.as_mut_ptr() as *mut u8;

        // Don't run the destructor for vec32
        mem::forget(vec32);

        // Construct new Vec
        Vec::from_raw_parts(ptr, length, capacity)
    };

    println!("{:?}", vec8)
}

Playground

My biggest unknown worry about this code lies in the alignment of the memory associated with the Vec.

Rust's underlying allocator allocates and deallocates memory with a specific Layout. Layout contains such information as the size and alignment of the pointer.

I'd assume that this code needs the Layout to match between paired calls to alloc and dealloc. If that's the case, dropping the Vec<u8> constructed from a Vec<u32> might tell the allocator the wrong alignment since that information is based on the element type.

Without better knowledge, the "best" thing to do would be to leave the Vec<u32> as-is and simply get a &[u8] to it. The slice has no interaction with the allocator, avoiding this problem.

Even without interacting with the allocator, you need to be careful about alignment!

See also:

  • How to slice a large Vec<i32> as &[u8]?
  • https://stackoverflow.com/a/48309116/155423
like image 162
Shepmaster Avatar answered Oct 02 '22 20:10

Shepmaster


If in-place convert is not so mandatory, something like this manages bytes order control and avoids the unsafe block:

extern crate byteorder;

use byteorder::{WriteBytesExt, BigEndian};

fn main() {
    let vec32: Vec<u32> = vec![0xaabbccdd, 2];
    let mut vec8: Vec<u8> = vec![];

    for elem in vec32 {
        vec8.write_u32::<BigEndian>(elem).unwrap();
    }

    println!("{:?}", vec8);
}
like image 20
attdona Avatar answered Oct 02 '22 20:10

attdona