Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I read integers from big-endian binary file if Windows/Delphi/IDE implies little-endian order?

I am very confused. I need to read binary files (.fsa extension by Applied Biotechnology aka ABIF, FASTA files) and I ran into a problem reading signed integers. I am doing everything according to this manual https://drive.google.com/file/d/1zL-r6eoTzFIeYDwH5L8nux2lIlsRx3CK/view?usp=sharing So, for example, let's look at the fDataSize field in the header of a file https://drive.google.com/file/d/1rrL01B_gzgBw28knvFit6hUIA5jcCDry/view?usp=sharing

I know that it is supposed to be 2688 (according to the manual, it is a signed integer of 32 bits), which is 00000000 00000000 00001010 10000000 in binary form. Indeed, when I read these 32 bits as an array of 4 bytes, I get [0, 0, 10, -128], which is exactly the same in binary form.

However, if I read it as Integer, it results in 16809994, which is 00000001 00000000 10000000 00001010 in bits.

As I understood from multiple forums, they use Swap and htonl functions to convert integers from little-endian order to big-endian. They also recommend using BSWAP EAX instruction for 32bit integers. But in this case they work in a kind of wrong way, specifically: Swap, applied to 16809994, returns 16779904 or 00000001 00000000 00001010 10000000, and BSWAP instruction converts 16809994 to 176160769, i.e. 00001010 10000000 00000000 00000001

As we can see, built-in functions do something different from what I need. Swap is likely to return the correct result, but, for some reason, reading these bits as an Integer changes the left-most byte. So, what is wrong and what do I do?

Upd. 1 For storing the header data I use the following record:

type
  TFasMainHeader = record
    fFrmt        : array[1..4]  of ansiChar;
    fVersion     : Word;
    fDir         : array[1..4] of ansiChar;
    fNumber      : array[1..4]  of Byte; //
    fElType      : Word;
    fElSize      : Word;
    fNumEls      : array[1..4]  of Byte; //
    fDataSize    : Integer;
    fDataOffset  : Integer;
    fDO : word;
    fDataHandle  : array[1..98]  of Byte;
  end;

Then upon the button click I perform the following:

aFileStream.Read(fas_main_header, SizeOf(TFasMainHeader));
with fas_main_header do begin
    if fFrmt <> 'ABIF' then raise Exception.Create('Not an ABIF file!');
    fVersion := Swap(fVersion);
    fElType := Swap(fElType);
    fElSize := Swap(fElSize);
...

Next I need to swap Int32 variables in the right way, but at this point fDataSize, for example, is 16809994. See the state of the record in detail during debugging:

enter image description here

It doesn't make sense to me since there shouldn't be a one-bit in the binary representation of fDataSize value (it also screws the BSWAP result).

See the binary structure of the file beginning (fDataSize bytes are highlited): enter image description here

like image 841
endocringe Avatar asked Nov 25 '20 12:11

endocringe


People also ask

How do you read big-endian and little-endian?

Big endian machine: Stores data big-end first. When looking at multiple bytes, the first byte (lowest address) is the biggest. Little endian machine: Stores data little-end first. When looking at multiple bytes, the first byte is smallest.

Is Windows big-endian or little-endian?

All versions of Windows that you'll see are little-endian, yes. The NT kernel actually runs on a big-endian architecture even today.

How are integers stored in little-endian?

On little endian platforms, the value 1 is stored in one byte as 01 (the same as big endian), in two bytes as 01 00, and in four bytes as 01 00 00 00. If an integer is negative, the "two's complement" representation is used. The high-order bit of the most significant byte of the integer will be set on.

How do you read little-endian?

In the case of little endian format, the least significant byte appears first, followed by the most significant byte. The letter 'T' has a value of 0x54 and is represented in 16 bit little endian as 54 00.


2 Answers

The problem has nothing to do with endianness, but with Delphi records.

You have

type
  TFasMainHeader = record
    fFrmt        : array[1..4]  of ansiChar;
    fVersion     : Word;
    fDir         : array[1..4] of ansiChar;
    fNumber      : array[1..4]  of Byte; //
    fElType      : Word;
    fElSize      : Word;
    fNumEls      : array[1..4]  of Byte; //
    fDataSize    : Integer;
    fDataOffset  : Integer;
    fDO : word;
    fDataHandle  : array[1..98]  of Byte;
  end;

and you expect this record to overlay the bytes in your file, with fDataSize "on top of" 00 00 0A 80.

But the Delphi compiler will add padding between the fields of the record to make them properly aligned. Hence, your fDataSize will not be at the correct offset.

To fix this, use the packed keyword:

type
  TFasMainHeader = packed record
    fFrmt        : array[1..4]  of ansiChar;
    fVersion     : Word;
    fDir         : array[1..4] of ansiChar;
    fNumber      : array[1..4]  of Byte; //
    fElType      : Word;
    fElSize      : Word;
    fNumEls      : array[1..4]  of Byte; //
    fDataSize    : Integer;
    fDataOffset  : Integer;
    fDO : word;
    fDataHandle  : array[1..98]  of Byte;
  end;

Then the fields will be at the expected locations.

And then -- of course -- you can use any method you like to swap the byte order.

Perferably the BSWAP instruction.

like image 126
Andreas Rejbrand Avatar answered Sep 24 '22 03:09

Andreas Rejbrand


Here is a implementation example using pure pascal:

program FasDemo;
{$APPTYPE CONSOLE}
uses
  System.SysUtils, System.Classes;

type
  TFasInt16 = packed record
    B0, B1 : Byte;
    function ToUInt32 : UInt32;
    function ToInt32  : Int32;
    class operator Implicit(A: TFasInt16): Integer;      // Implicit conversion of TFasInt16 to Integer
    class operator Implicit(A: Integer)  : TFasInt16;    // Implicit conversion of Integer   to TFasInt16
  end;
  TFasInt32 = packed record
    W0, W1 : TFasInt16;
    function ToUInt32 : UInt32;
    function ToInt32  : Int32;
    class operator Implicit(A: TFasInt32): Integer;      // Implicit conversion of TFasInt32 to Integer
    class operator Implicit(A: Integer)  : TFasInt32;    // Implicit conversion of Integer to TFasInt32
  end;


function TFasInt16.ToUInt32: UInt32;
begin
  Result := (B0 shl 8) + B1;
end;

function TFasInt16.ToInt32: Int32;
begin
  Result := Int16(B0 shl 8) + B1;
end;

class operator TFasInt16.Implicit(A: Integer): TFasInt16;
begin
  Result.B1 := Byte(A);
  Result.B0 := Byte(A shr 8);
end;

class operator TFasInt16.Implicit(A: TFasInt16): Integer;
begin
  Result := A.ToInt32;
end;

function TFasInt32.ToUInt32: UInt32;
begin
  Result := (W0.ToUInt32 shl 16) + W1.ToUInt32;
end;

function TFasInt32.ToInt32: Int32;
begin
  Result := (W0.ToUInt32 shl 16) + W1.ToUInt32;
end;

class operator TFasInt32.Implicit(A: TFasInt32): Integer;
begin
  Result := A.ToInt32;
end;

class operator TFasInt32.Implicit(A: Integer): TFasInt32;
begin
  Result.W1 := Int16(A);
  Result.W0 := Int16(A shr 16);
end;

var
  Stream   : TFileStream;
  FasInt32 : TFasInt32;
  FasInt16 : TFasInt16;
  AInteger : Integer;
begin
  Stream := TFileStream.Create('C:\Users\fpiette\Downloads\A02-RD12-0002-35-0.5PP16-001.5sec.fsa', fmOpenRead);
  try
    Stream.Position := $16;
    Stream.Read(FasInt32, SizeOf(FasInt32));
    WriteLn(FasInt32.W1.ToUInt32, ' 0x', IntToHex(FasInt32.W1.ToUInt32, 8));
    WriteLn(FasInt32.W1.ToInt32,  ' 0x', IntToHex(FasInt32.W1.ToInt32,  8));
    WriteLn(FasInt32.ToUInt32,    ' 0x', IntToHex(FasInt32.ToUInt32,    8));
    WriteLn(FasInt32.ToInt32,     ' 0x', IntToHex(FasInt32.ToInt32,     8));

    WriteLn;
    WriteLn('Test implicit conversion 16 bits to integer ');
    AInteger := FasInt32.W1;
    WriteLn(AInteger,             ' 0x', IntToHex(AInteger,     8));

    WriteLn;
    WriteLn('Test implicit conversion 32 bits to integer ');
    AInteger := FasInt32;
    WriteLn(AInteger,             ' 0x', IntToHex(AInteger,     8));

    WriteLn;
    WriteLn('Test implicit conversion 16 bits from integer');
    FasInt16 := 1234;
    WriteLn(FasInt16.ToInt32,     ' 0x', IntToHex(FasInt16.ToInt32,  8));
    FasInt16 := -1234;
    WriteLn(FasInt16.ToInt32,     ' 0x', IntToHex(FasInt16.ToInt32,  8));

    WriteLn;
    WriteLn('Test implicit conversion 32 bits from integer');
    FasInt32 := 12345678;
    WriteLn(FasInt32.ToInt32,     ' 0x', IntToHex(FasInt32.ToInt32,     8));
    FasInt32 := -12345678;
    WriteLn(FasInt32.ToInt32,     ' 0x', IntToHex(FasInt32.ToInt32,     8));

    ReadLn;
  finally
    FreeAndNil(Stream);
  end;
end.

You can add, if your Delphi version support it, add inline directive.

I made implicit conversions to/from integer using operator overloading. Using it the types can be used without calling conversion routines: the compiler does the job for us!

Of course other operator overloading can be added, you get the idea.

To access the FAS header and other structures, you can use the types TFasInt32 and TFasInt16 instead of Word and Integer. The rest of the code will be just has it was not big-endian! The compiler will automatically convert back and forth to native integers (little-endian).

like image 33
fpiette Avatar answered Sep 21 '22 03:09

fpiette