Why can't two TBytes use overlapping data?

Question

Consider the following XE6 code. The intention is that ThingData should be written to the console for both Thing1 & Thing2, but it is not. Why is that?

program BytesFiddle;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils;

type
  TThing = class
  private
    FBuf : TBytes;
    FData : TBytes;
    function GetThingData: TBytes;
    function GetThingType: Byte;
  public
    property ThingType : Byte read GetThingType;
    property ThingData : TBytes read GetThingData;

    constructor CreateThing(const AThingType : Byte; const AThingData: TBytes);
  end;

{ TThing1 }

constructor TThing.CreateThing(const AThingType : Byte; const AThingData: TBytes);
begin
  SetLength(FBuf, Length(AThingData) + 1);
  FBuf[0] := AThingType;
  Move(AThingData[0], FBuf[1], Length(AThingData));

  FData := @FBuf[1];
  SetLength(FData, Length(FBuf) - 1);
end;

function TThing.GetThingData: TBytes;
begin
  Result := FData;
end;

function TThing.GetThingType: Byte;
begin
  Result := FBuf[0];
end;

var
  Thing1, Thing2 : TThing;

begin
  try
    Thing1 := TThing.CreateThing(0, TEncoding.UTF8.GetBytes('Sneetch'));
    Thing2 := TThing.CreateThing(1, TEncoding.UTF8.GetBytes('Star Belly Sneetch'));

    Writeln(TEncoding.UTF8.GetString(Thing2.ThingData));
    Writeln(Format('Type %d', [Thing2.ThingType]));

    Writeln(TEncoding.UTF8.GetString(Thing1.ThingData));
    Writeln(Format('Type %d', [Thing1.ThingType]));

    ReadLn;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

Johan · Accepted Answer

Let me walk you through the ways in which this code fails and how the compiler allows you to shoot yourself in the foot.

If you step through the code using the debugger you can see what happens.

enter image description here

After the initialization of Thing1 you can see that FData is filled with all zeros.
Strangely enough Thing2 is fine.
Therefore the error is in CreateThing. Let's investigate further...

In the oddly named constructor CreateThing you have the following line:

FData := @FBuf[1];

This looks like a simple assignment, but is really a call to DynArrayAssign

Project97.dpr.32: FData := @FBuf[1];
0042373A 8B45FC           mov eax,[ebp-$04]
0042373D 83C008           add eax,$08
00423743 8B5204           mov edx,[edx+$04]
00423746 42               inc edx
00423747 8B0DE03C4000     mov ecx,[$00403ce0]
0042374D E8E66DFEFF       call @DynArrayAsg      <<-- lots of stuff happening here.

One of the checks DynArrayAsg performs is to check whether the source dynamic array is empty or not.
DynArrayAsg also does a few other things which you need to be aware about.

Let's first have a look at the structure of a dynamic array; it's not just a simple pointer to an array!

Offset 32/64  |   Contents     
--------------+--------------------------------------------------------------
-8/-12        | 32 bit reference count
-4/-8         | 32 or 64 bit length indicator 
 0/ 0         | data of the array.

Performing FData = @FBuf[1] you are messing up with the prefix fields of the dynamic array.
The 4 bytes in front of @Fbuf[1] are interpreted as the length.
For Thing1 these are:

          -8 (refcnt)  -4 (len)     0 (data)
FBuf:     01 00 00 00  08 00 00 00  00  'S' 'n' .....
FData:    00 00 00 08  00 00 00 00  .............. //Hey that's a zero length.

Oops, when DynArrayAsg starts investigating it sees that what it thinks is the source for the assign has a length of zero, i.e. it thinks the source is empty and does not assign anything. It leaves FData unchanged!

Does Thing2 work as intended?
It looks like it does, but it actually fails in rather a bad way, let me show you.

enter image description here

You've successfully tricked the runtime into believing @FBuf[1] is a valid reference to a dynamic array.
Because of this the FData pointer has been updated to point to FBuf[1] (so far so good), and the reference count of FData has been increased by 1 (not good), also the runtime has grown the memory block holding the dynamic array to what it thinks is the correct size for FData (bad).

          -8 (refcnt)  -4 (len)     0 (data)
FBuf:     01 01 00 00  13 00 00 00  01  'S' 'n' .....
FData:    01 00 00 13  00 00 00 01  'S' ..............

Oops FData now has a refcount of 318,767,105 and a length of 16,777,216 bytes.
FBuf also has its length increased, but its refcount is now 257.

This is why you need the call to SetLength to undo the massive overallocation of memory. This still does not fix the reference counts though.
The overallocation may cause out of memory errors (esp. on 64-bit) and the wacky refcounts cause a memory leak because your arrays will never get freed.

The solution
As per David's answer: enable typed checked pointers: {$TYPEDADDRESS ON}

You can fix the code by defining FData as a normal PAnsiChar or PByte.
If you make sure to always terminate your assignments to FBuf with a double zero FData will work as expected.

Make FData a TBuffer like so:

TBuffer = record
private
  FData : PByte;
  function GetLength: cardinal;
  function GetType: byte;
public
  class operator implicit(const A: TBytes): TBuffer;
  class operator implicit(const A: TBuffer): PByte;
  property Length: cardinal read GetLength;
  property DataType: byte read GetType;
end;

Rewrite CreateThing like so:

constructor TThing.CreateThing(const AThingType : Byte; const AThingData: TBytes);
begin
  SetLength(FBuf, Length(AThingData) + Sizeof(AThingType) + 2);
  FBuf[0] := AThingType;
  Move(AThingData[0], FBuf[1], Length(AThingData));
  FBuf[Lengh(FBuf)-1]:= 0;
  FBuf[Lengh(FBuf)-2]:= 0;  //trailing zeros for compatibility with pansichar

  FData := FBuf;  //will call the implicit class operator.
end;

class operator TBuffer.implicit(const A: TBytes): TBuffer;
begin
  Result.FData:= PByte(@A[1]);
end;

I don't understand all this mucking about trying to outsmart the compiler.
Why not just declare FData like so:

type
  TMyData = record
    DataType: byte;
    Buffer: Ansistring;  
    ....

And work with that.

David Heffernan · Answer

The problem can be seen readily by enabling type-checked pointers. Add this to the top of your code:

{$TYPEDADDRESS ON}

The documentation says:

The $T directive controls the types of pointer values generated by the @ operator and the compatibility of pointer types.

In the {$T-} state, the result of the @ operator is always an untyped pointer (Pointer) that is compatible with all other pointer types. When @ is applied to a variable reference in the {$T+} state, the result is a typed pointer that is compatible only with Pointer and with other pointers to the type of the variable.

In the {$T-} state, distinct pointer types other than Pointer are incompatible (even if they are pointers to the same type). In the {$T+} state, pointers to the same type are compatible.

With that change your program fails to compile. This line fails:

FData := @FBuf[1];

The error message is:

E2010 Incompatible types: 'System.TArray<System.Byte>' and 'Pointer'

Now, FData is of type TArray<Byte> but @FBuf[1] is not a dynamic array but rather a pointer to a byte in the middle of a dynamic array. The two are not compatible. By operating in the default mode where pointers are not type-checked, the compiler lets you commit this terrible mistake. Quite why this is the default mode is utterly beyond me.

A dynamic array is more than a pointer to the first element – there is also metadata such as length and reference count. That metadata is stored at an offset from the first element. Hence your entire design is flawed. Store the type code in a separate variable, and not as part of the dynamic array.

Why can't two TBytes use overlapping data?

Tags:

delphi

dynamic-arrays

Hugh Jones

Video Answer

2 Answers

Johan

David Heffernan

Recent Activity

Donate For Us

Why can't two TBytes use overlapping data?

Tags:

delphi

dynamic-arrays

Hugh Jones

Video Answer

2 Answers

Johan

David Heffernan

Related questions

Recent Activity

Donate For Us