Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++. reinterpret_cast from double to unsigned char*

I was having a small play around with C++ today and came across this which I thought was odd, but perhaps more likely due to a misunderstanding by me and lack of pure C coding recently.

What I was looking to do originally was convert a double into an array of unsigned chars. My understanding was that the 64 bits of the double (sizeof(double) is 8) would now be represented as 8 8-bit chars. To do this I was using reinterpret_cast.

So here's some code to convert from double to char array, or at least I thought that's what it was doing. Problem was it was returning 15 from strlen instead of 8, why I'm not sure.

double d = 0.3;

unsigned char *c = reinterpret_cast<unsigned char*> ( &d );

std::cout << strlen( (char*)c ) << std::endl;

So the strlen was my first issue. But then I tried the following and found that it returned 11, 19, 27, 35. The difference between these numbers is 8 so on some level something right is going on. But why does this not return 15, 15, 15, 15, ( as it was returning 15 in the code above ).

double d = 0.3;
double d1 = 0.3;
double d2 = 0.3;
double d3 = 0.3;

unsigned char *c_d = reinterpret_cast<unsigned char*> ( &d );
unsigned char *c_d1 = reinterpret_cast<unsigned char*> ( &d1 );
unsigned char *c_d2 = reinterpret_cast<unsigned char*> ( &d2 );
unsigned char *c_d3 = reinterpret_cast<unsigned char*> ( &d3 );

std::cout << strlen( (char*)c_d ) << std::endl;
std::cout << strlen( (char*)c_d1 ) << std::endl;
std::cout << strlen( (char*)c_d2 ) << std::endl;
std::cout << strlen( (char*)c_d3 ) << std::endl;

So I looked at the addresses of the chars and they are.

0x28fec4
0x28fec0
0x28febc
0x28feb8 

Now this makes sense given that the size of an unsigned char* on my system is 4 bytes, but I thought the correct amount of memory would be allocated from the cast, otherwise it seems like reinterpret_cast is a pretty dangerous thing... Furthermore if I do

for (int i = 0; i < 4; ++i) {
    double d = 0.3;

    unsigned char *c = reinterpret_cast<unsigned char*> ( &d );

    std::cout << strlen( (char*)c ) << std::endl;
}

This prints 11, 11, 11, 11!

So what is going on here, clearly memory is getting overwritten in places and reinterpret cast is not working as I thought it would (i.e. I'm using it wrong). Having been using strings for so long in C++, sometimes when you go back to raw char arrays, you forget these things.

So I suppose this is a 3 part question.

Why was strlen initially returning 15? Why did the 4 strlen calls grow in size? Why did the loop return 11, 11, 11, 11?

Thanks.

like image 353
Muckle_ewe Avatar asked Apr 03 '13 15:04

Muckle_ewe


2 Answers

strlen works by iterating through the array that it assumes the passed const char* points at until it finds a char with value 0. This is the null-terminating character that is automatically added to the end of string literals. The bytes that make up the value representation of your double do not end with a null character. The strlen will just keep going past the end of your double object until it finds a byte with value 0.

Consider the string literal "Hello". In memory, with an ASCII compatible execution character set, this will be stored as the following bytes (in hexadecimal):

48 65 6c 6c 6f 00

strlen would read through each of them until it found the byte with value 0 and report how many bytes it has seen so far.

The IEEE 754 double precision representation of 0.3 is:

3F D3 33 33 33 33 33 33

As you can see, there is no byte with value 0, so strlen just won't know when to stop.

Whatever value the function returns is probably just how far it got until it found a 0 in memory, but you've already hit undefined behaviour and so making any guesses about it is pointless.

like image 77
Joseph Mansfield Avatar answered Sep 18 '22 09:09

Joseph Mansfield


Your problem is your use of strlen( (char*)c ), because strlen expects a pointer to a null-terminated character string.

It seems like you're expecting some sort of "boundary" between the 8th and 9th byte, since those first 8 bytes were originally a double.

That information is lost once you've cast that memory to a char*. It becomes the responsibility of your code to know how many chars are valid.

like image 37
Drew Dormann Avatar answered Sep 18 '22 09:09

Drew Dormann