Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert uint64_t to uint8_t[8]

Tags:

c++

boost

How can I convert uint64_t to uint8_t[8] without loosing information in C++?

I tried the following:

uint64_t number = 23425432542254234532;
uint8_t result[8];
for(int i = 0; i < 8; i++) {
    std::memcpy(result[i], number, 1);
}
like image 666
Jannik Becher Avatar asked Feb 02 '16 12:02

Jannik Becher


2 Answers

You are almost there. Firstly, the literal 23425432542254234532 is too big to fit in uint64_t.

Secondly, as you can see from the documentation, std::memcpy has the following declaration:

void * memcpy ( void * destination, const void * source, size_t num );

As you can see, it takes pointers (addresses) as arguments. Not uint64_t, nor uint8_t. You can easily get the address of the integer using the address-of operator.

Thridly, you are only copying the first byte of the integer into each array element. You would need to increment the input pointer in every iteration. But the loop is unnecessary. You can copy all bytes in one go like this:

std::memcpy(result, &number, sizeof number);

Do realize that the order of the bytes depend on the endianness of the cpu.

like image 108
eerorika Avatar answered Nov 02 '22 05:11

eerorika


First, do you want the conversion to be big-endian, or little-endian? Most of the previous answers are going to start giving you the bytes in the opposite order, and break your program,` as soon as you switch architectures.

If you need to get consistent results, you would want to convert your 64-bit input into big-endian (network) byte order, or perhaps to little-endian. For example, on GNU glib, the function is GUINT64_TO_BE(), but there is an equivalent built-in function for most compilers.

Having done that, there are several alternatives:

Copy with memcpy() or memmove()

This is the method that the language standard guarantees will work, although here I use one function from a third-party library (to convert the argument to big-endian byte order on all platforms). For example:

#include <stdint.h>
#include <stdlib.h>

#include <glib.h>

union eight_bytes {
  uint64_t u64;
  uint8_t b8[sizeof(uint64_t)];
};

eight_bytes u64_to_eight_bytes( const uint64_t input )
{
  eight_bytes result;
  const uint64_t big_endian = (uint64_t)GUINT64_TO_BE((guint64)input);

  memcpy( &result.b8, &big_endian, sizeof(big_endian) );
  return result;
}

On Linux x86_64 with clang++ -std=c++17 -O, this compiles to essentially the instructions:

bswapq  %rdi
movq    %rdi, %rax
retq

If you wanted the results in little-endian order on all platforms, you could replace GUINT64_TO_BE() with GUINT64_TO_LE() and remove the first instruction, then declare the function inline to remove the third instruction. (Or, if you’re certain that cross-platform compatibility does not matter, you might risk just omitting the normalization.)

So, on a modern, 64-bit compiler, this code is just as efficient as anything else. On another target, it might not be.

Type-Punning

The common way to write this in C would be to declare the union as before, set its uint64_t member, and then read its uint8_t[8] member. This is legal in C.

I personally like it because it allows me to express the entire operation as static single assignments.

However, in C++, it is formally undefined behavior. In practice, all C++ compilers I’m aware of support it for Plain Old Data (the formal term in the language standard), of the same size, with no padding bits, but not for more complicated classes that have virtual function tables and the like. It seems more likely to me that a future version of the Standard will officially support type-punning on POD than that any important compiler will ever break it silently.

The C++ Guidelines Way

Bjarne Stroustrup recommended that, if you are going to type-pun instead of copying, you use reinterpret_cast, such as

uint8_t (&array_of_bytes)[sizeof(uint64_t)] =
      *reinterpret_cast<uint8_t(*)[sizeof(uint64_t)]>(
        &proper_endian_uint64);

His reasoning was that both an explicit cast and type-punning through a union are undefined behavior, but the cast makes it blatant and unmistakable that you are shooting yourself in the foot on purpose, whereas reading a different union member than the active one can be a very subtle bug.

like image 10
Davislor Avatar answered Nov 02 '22 07:11

Davislor