Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect endianness of binary file data

Recently I was (again) reading about 'endian'ness. I know how to identify the endianness of host, as there are lots of post on SO, and also I have seen this, which I think is pretty good resource.

However, one thing I like to know is to how to detect the endianness of input binary file. For example, I am reading a binary file (using C++) like following:

ifstream mydata("mydata.raw", ios::binary);

short value;
char buf[sizeof(short)];
int dataCount = 0;

short myDataMat[DATA_DIMENSION][DATA_DIMENSION];
while (mydata.read(reinterpret_cast<char*>(&buf), sizeof(buf)))
{
    memcpy(&value, buf, sizeof(value));
    myDataMat[dataCount / DATA_DIMENSION][dataCount%DATA_DIMENSION] = value;
    dataCount++;
}

I like to know how I can detect the endianness in the mydata.raw, and whether endianness affects this program anyway.

Additional Information:

  • I am only manipulating the data in myDataMat using mathematical operations, and no pointer operation or bitwise operation is done on the data).
  • My machine (host) is little endian.
like image 716
Sayan Pal Avatar asked Dec 15 '22 05:12

Sayan Pal


1 Answers

It is impossible to "detect" the endianity of data in general. Just like it is impossible to detect whether the data is an array of 4 byte integers, or twice that many 2 byte integers. Without any knowledge about the representation, raw data is just a mass of meaningless bits.

However, with some extra knowledge about the data representation, it become possible. Some examples:

  • Most file formats mandate particular endianity, in which case this is never a problem.
  • Unicode text files may optionally start with a byte order mark. Same idea can be implemented by other data representations.
  • Some file formats contain a checksum. You can guess one endianity, and if the checksum does not match, try again with another endianity. It will be unlikely that the checksum matches with wrong interpretation of the data.
  • Sometimes you can make guesses based on the data. Is the temperature outside 33'554'432 degrees, or maybe 2? You can pick the endianity that represents sane data. Of course, this type of guesswork fails miserably, when the aliens invade and start melting our planet.
like image 76
eerorika Avatar answered Jan 04 '23 08:01

eerorika