Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is *(int*)&data[18] actually doing in this code?

I came across this syntax for reading a BMP file in C++

#include <fstream>
int main() {
    std::ifstream in('filename.bmp', std::ifstream::binary);
    in.seekg(0, in.end);
    size = in.tellg();
    in.seekg(0);
    unsigned char * data = new unsigned char[size];
    in.read((unsigned char *)data, size);

    int width = *(int*)&data[18];
    // omitted remainder for minimal example
}

and I don't understand what the line

int width = *(int*)&data[18];

is actually doing. Why doesn't a simple cast from unsigned char * to int, int width = (int)data[18];, work?

like image 231
RBreight Avatar asked Dec 04 '19 23:12

RBreight


1 Answers

Note

As @user4581301 indicated in the comments, this depends on the implementation and will fail in many instances. And as @NathanOliver- Reinstate Monica and @ChrisMM pointed out this is Undefined Behavior and the result is not guaranteed.

According to the bitmap header format, the width of the bitmap in pixels is stored as a signed 32-bit integer beginning at byte offset 18. The syntax

int width = *(int*)&data[18];

reads bytes 19 through 22, inclusive (assuming a 32-bit int) and interprets the result as an integer.

How?

  • &data[18] gets the address of the unsigned char at index 18
  • (int*) casts the address from unsigned char* to int* to avoid loss of precision on 64 bit architectures
  • *(int*) dereferences the address to get the referred int value

So basically, it takes the address of data[18] and reads the bytes at that address as if they were an integer.

Why doesn't a simple cast to `int` work?

sizeof(data[18]) is 1, because unsigned char is one byte (0-255) but sizeof(&data[18]) is 4 if the system is 32-bit and 8 if it is 64-bit, this can be larger (or even smaller for 16-bit systems) but with the exception of 16-bit systems it should be at minimum 4 bytes. Obviously reading more than 4 bytes is not desired in this case, and the cast to (int*) and subsequent dereference to int yields 4 bytes, and indeed the 4 bytes between offsets 18 and 21, inclusive. A simple cast from unsigned char to int will also yield 4 bytes, but only one byte of the information from data. This is illustrated by the following example:

#include <iostream>
#include <bitset>

int main() {
    // Populate 18-21 with a recognizable pattern for demonstration
    std::bitset<8> _bits(std::string("10011010"));
    unsigned long bits = _bits.to_ulong();
    for (int ii = 18; ii < 22; ii ++) {
        data[ii] = static_cast<unsigned char>(bits);
    }

    std::cout << "data[18]                    -> 1 byte  " 
        << std::bitset<32>(data[18]) << std::endl;
    std::cout << "*(unsigned short*)&data[18] -> 2 bytes " 
        << std::bitset<32>(*(unsigned short*)&data[18]) << std::endl;
    std::cout << "*(int*)&data[18]            -> 4 bytes " 
        << std::bitset<32>(*(int*)&data[18]) << std::endl;
}
data[18]                    -> 1 byte  00000000000000000000000010011010
*(unsigned short*)&data[18] -> 2 bytes 00000000000000001001101010011010
*(int*)&data[18]            -> 4 bytes 10011010100110101001101010011010
like image 74
William Miller Avatar answered Nov 10 '22 17:11

William Miller