I am very confused. I need to read binary files (.fsa extension by Applied Biotechnology aka ABIF, FASTA files) and I ran into a problem reading signed integers. I am doing everything according to this manual https://drive.google.com/file/d/1zL-r6eoTzFIeYDwH5L8nux2lIlsRx3CK/view?usp=sharing So, for example, let's look at the fDataSize field in the header of a file https://drive.google.com/file/d/1rrL01B_gzgBw28knvFit6hUIA5jcCDry/view?usp=sharing
I know that it is supposed to be 2688 (according to the manual, it is a signed integer of 32 bits), which is 00000000 00000000 00001010 10000000 in binary form. Indeed, when I read these 32 bits as an array of 4 bytes, I get [0, 0, 10, -128], which is exactly the same in binary form.
However, if I read it as Integer, it results in 16809994, which is 00000001 00000000 10000000 00001010 in bits.
As I understood from multiple forums, they use Swap and htonl functions to convert integers from little-endian order to big-endian. They also recommend using BSWAP EAX instruction for 32bit integers. But in this case they work in a kind of wrong way, specifically: Swap, applied to 16809994, returns 16779904 or 00000001 00000000 00001010 10000000, and BSWAP instruction converts 16809994 to 176160769, i.e. 00001010 10000000 00000000 00000001
As we can see, built-in functions do something different from what I need. Swap is likely to return the correct result, but, for some reason, reading these bits as an Integer changes the left-most byte. So, what is wrong and what do I do?
Upd. 1 For storing the header data I use the following record:
type
TFasMainHeader = record
fFrmt : array[1..4] of ansiChar;
fVersion : Word;
fDir : array[1..4] of ansiChar;
fNumber : array[1..4] of Byte; //
fElType : Word;
fElSize : Word;
fNumEls : array[1..4] of Byte; //
fDataSize : Integer;
fDataOffset : Integer;
fDO : word;
fDataHandle : array[1..98] of Byte;
end;
Then upon the button click I perform the following:
aFileStream.Read(fas_main_header, SizeOf(TFasMainHeader));
with fas_main_header do begin
if fFrmt <> 'ABIF' then raise Exception.Create('Not an ABIF file!');
fVersion := Swap(fVersion);
fElType := Swap(fElType);
fElSize := Swap(fElSize);
...
Next I need to swap Int32 variables in the right way, but at this point fDataSize, for example, is 16809994. See the state of the record in detail during debugging:
It doesn't make sense to me since there shouldn't be a one-bit in the binary representation of fDataSize value (it also screws the BSWAP result).
See the binary structure of the file beginning (fDataSize bytes are highlited):
Big endian machine: Stores data big-end first. When looking at multiple bytes, the first byte (lowest address) is the biggest. Little endian machine: Stores data little-end first. When looking at multiple bytes, the first byte is smallest.
All versions of Windows that you'll see are little-endian, yes. The NT kernel actually runs on a big-endian architecture even today.
On little endian platforms, the value 1 is stored in one byte as 01 (the same as big endian), in two bytes as 01 00, and in four bytes as 01 00 00 00. If an integer is negative, the "two's complement" representation is used. The high-order bit of the most significant byte of the integer will be set on.
In the case of little endian format, the least significant byte appears first, followed by the most significant byte. The letter 'T' has a value of 0x54 and is represented in 16 bit little endian as 54 00.
The problem has nothing to do with endianness, but with Delphi records.
You have
type
TFasMainHeader = record
fFrmt : array[1..4] of ansiChar;
fVersion : Word;
fDir : array[1..4] of ansiChar;
fNumber : array[1..4] of Byte; //
fElType : Word;
fElSize : Word;
fNumEls : array[1..4] of Byte; //
fDataSize : Integer;
fDataOffset : Integer;
fDO : word;
fDataHandle : array[1..98] of Byte;
end;
and you expect this record to overlay the bytes in your file, with fDataSize
"on top of" 00 00 0A 80
.
But the Delphi compiler will add padding between the fields of the record to make them properly aligned. Hence, your fDataSize
will not be at the correct offset.
To fix this, use the packed
keyword:
type
TFasMainHeader = packed record
fFrmt : array[1..4] of ansiChar;
fVersion : Word;
fDir : array[1..4] of ansiChar;
fNumber : array[1..4] of Byte; //
fElType : Word;
fElSize : Word;
fNumEls : array[1..4] of Byte; //
fDataSize : Integer;
fDataOffset : Integer;
fDO : word;
fDataHandle : array[1..98] of Byte;
end;
Then the fields will be at the expected locations.
And then -- of course -- you can use any method you like to swap the byte order.
Perferably the BSWAP
instruction.
Here is a implementation example using pure pascal:
program FasDemo;
{$APPTYPE CONSOLE}
uses
System.SysUtils, System.Classes;
type
TFasInt16 = packed record
B0, B1 : Byte;
function ToUInt32 : UInt32;
function ToInt32 : Int32;
class operator Implicit(A: TFasInt16): Integer; // Implicit conversion of TFasInt16 to Integer
class operator Implicit(A: Integer) : TFasInt16; // Implicit conversion of Integer to TFasInt16
end;
TFasInt32 = packed record
W0, W1 : TFasInt16;
function ToUInt32 : UInt32;
function ToInt32 : Int32;
class operator Implicit(A: TFasInt32): Integer; // Implicit conversion of TFasInt32 to Integer
class operator Implicit(A: Integer) : TFasInt32; // Implicit conversion of Integer to TFasInt32
end;
function TFasInt16.ToUInt32: UInt32;
begin
Result := (B0 shl 8) + B1;
end;
function TFasInt16.ToInt32: Int32;
begin
Result := Int16(B0 shl 8) + B1;
end;
class operator TFasInt16.Implicit(A: Integer): TFasInt16;
begin
Result.B1 := Byte(A);
Result.B0 := Byte(A shr 8);
end;
class operator TFasInt16.Implicit(A: TFasInt16): Integer;
begin
Result := A.ToInt32;
end;
function TFasInt32.ToUInt32: UInt32;
begin
Result := (W0.ToUInt32 shl 16) + W1.ToUInt32;
end;
function TFasInt32.ToInt32: Int32;
begin
Result := (W0.ToUInt32 shl 16) + W1.ToUInt32;
end;
class operator TFasInt32.Implicit(A: TFasInt32): Integer;
begin
Result := A.ToInt32;
end;
class operator TFasInt32.Implicit(A: Integer): TFasInt32;
begin
Result.W1 := Int16(A);
Result.W0 := Int16(A shr 16);
end;
var
Stream : TFileStream;
FasInt32 : TFasInt32;
FasInt16 : TFasInt16;
AInteger : Integer;
begin
Stream := TFileStream.Create('C:\Users\fpiette\Downloads\A02-RD12-0002-35-0.5PP16-001.5sec.fsa', fmOpenRead);
try
Stream.Position := $16;
Stream.Read(FasInt32, SizeOf(FasInt32));
WriteLn(FasInt32.W1.ToUInt32, ' 0x', IntToHex(FasInt32.W1.ToUInt32, 8));
WriteLn(FasInt32.W1.ToInt32, ' 0x', IntToHex(FasInt32.W1.ToInt32, 8));
WriteLn(FasInt32.ToUInt32, ' 0x', IntToHex(FasInt32.ToUInt32, 8));
WriteLn(FasInt32.ToInt32, ' 0x', IntToHex(FasInt32.ToInt32, 8));
WriteLn;
WriteLn('Test implicit conversion 16 bits to integer ');
AInteger := FasInt32.W1;
WriteLn(AInteger, ' 0x', IntToHex(AInteger, 8));
WriteLn;
WriteLn('Test implicit conversion 32 bits to integer ');
AInteger := FasInt32;
WriteLn(AInteger, ' 0x', IntToHex(AInteger, 8));
WriteLn;
WriteLn('Test implicit conversion 16 bits from integer');
FasInt16 := 1234;
WriteLn(FasInt16.ToInt32, ' 0x', IntToHex(FasInt16.ToInt32, 8));
FasInt16 := -1234;
WriteLn(FasInt16.ToInt32, ' 0x', IntToHex(FasInt16.ToInt32, 8));
WriteLn;
WriteLn('Test implicit conversion 32 bits from integer');
FasInt32 := 12345678;
WriteLn(FasInt32.ToInt32, ' 0x', IntToHex(FasInt32.ToInt32, 8));
FasInt32 := -12345678;
WriteLn(FasInt32.ToInt32, ' 0x', IntToHex(FasInt32.ToInt32, 8));
ReadLn;
finally
FreeAndNil(Stream);
end;
end.
You can add, if your Delphi version support it, add inline directive.
I made implicit conversions to/from integer using operator overloading. Using it the types can be used without calling conversion routines: the compiler does the job for us!
Of course other operator overloading can be added, you get the idea.
To access the FAS header and other structures, you can use the types TFasInt32 and TFasInt16 instead of Word and Integer. The rest of the code will be just has it was not big-endian! The compiler will automatically convert back and forth to native integers (little-endian).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With