Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read/write integer values from bytes without old_io?

There are convenient traits Reader and Writer in std::old_io module to read/write integer values using various endiannes. But that module is declared as obsolete so I'm trying to figure out other ways to do that.

The one way is to read bytes and construct result values with bit arithmetic. Is there other way in standard library? E.g. to read u64 from &[u8] where it's encoded in big-endian encoding. What I would do in C is to memcpy 8 bytes from a uint8_t array to a uint64_t value and then perform something like htons to swap bytes if necessary.

like image 490
vbezhenar Avatar asked Nov 29 '22 10:11

vbezhenar


1 Answers

It is very easy to convert an integer value into an array/slice, which can be used to write to a file stream, like you say above about using bit arithmetic. However, I wanted to post here so that people understand that using bit methods (like I do below and the original poster already mentioned) actually optimize to a single instruction on the X86_64 at least. This is exactly the same as doing the memcpy operation that the original poster talks about.

For example, take a look at this code:

#[inline]
fn u16tou8ale(v: u16) -> [u8; 2] {
    [
        v as u8,
        (v >> 8) as u8,
    ]
}

// little endian
#[inline]
fn u32tou8ale(v: u32) -> [u8; 4] {
    [
        v as u8,
        (v >> 8) as u8,
        (v >> 16) as u8,
        (v >> 24) as u8,
    ]
}

// big endian
#[inline]
fn u32tou8abe(v: u32) -> [u8; 4] {
    [
        (v >> 24) as u8,
        (v >> 16) as u8,
        (v >> 8) as u8,
        v as u8,
    ]
}

fn main() {
    println!("{:?}", u32tou8ale(0x12345678));
    println!("{:?}", u32tou8abe(0x12345678));
}

The function u32tou8ale, for example, actually turns into a single instruction that the CPU executes. That single instruction creates the [u8; 4] array on the stack, even the big-endian version u32tou8abe is a single instruction to create the [u8; 4]. This is possible because of the optimizer. You may say well that is because it is a constant compile time value, but if you experiment you will find that when given a u32 value that the compiler is unable to know ahead of time it will still produce the array on the stack in a single instruction essentially by doing a memory copy operation. For example:

fn main() {
    unsafe {
        let p: *const u32 = std::mem::transmute(main);
        println!("{:?}", u32tou8ale(*p));
    }
}

This reads a value from the memory location referenced by the symbol main which is our function. The compiler can not know this value and therefore it issues a move instruction that reads the value onto the stack, and then it considers that value a [u8; 4].

As for portability, just simply always be explicit about what byte order you read and write the value in and everything will work out fine. For example if you use u32tou8ale then you get little byte order no matter what architecture you target, and if you wrote the equivalent read function and you explicitly read as big byte order then you can be sure that you will read in that ordering.

I hope this helps anyone who comes here looking to convert integers into bytes and from!

like image 71
kmcguire Avatar answered Dec 06 '22 06:12

kmcguire