Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way to generate random sample data using delphi

Tags:

delphi

i have a struture like this

const
  MaxSignalRecords=255;
type
  TSignalRecord=record
   signal1  : integer;
   signal2  : integer;
   signal3  : integer;
   signal4  : integer;
   signal5  : integer;
   signal6  : integer;
   bsignal1 : Boolean;
   bsignal2 : Boolean;
   bsignal3 : Boolean;
   bsignal4 : Boolean;
   bsignal5 : Boolean;
   bsignal6 : Boolean;
  end;

TListSignals = Array[0..MaxSignalRecords-1] of TSignalRecord;

and a procedure to generate random sample data

Procedure FillRandomListSignals(var ListSignals:TListSignals);
var
  i :Integer;
begin
  for i := 0 to MaxSignalRecords - 1 do
  with ListSignals[i] do
  begin
   signal1   :=Random(MaxInt);
   signal2   :=Random(MaxInt);
   signal3   :=Random(MaxInt);
   signal4   :=Random(MaxInt);
   signal5   :=Random(MaxInt);
   signal6   :=Random(MaxInt);
   bsignal1  :=Boolean(Random(2));
   bsignal2  :=Boolean(Random(2));
   bsignal3  :=Boolean(Random(2));
   bsignal4  :=Boolean(Random(2));
   bsignal5  :=Boolean(Random(2));
   bsignal6  :=Boolean(Random(2));
  end;
end;

How i can improve the performance of the FillRandomListSignals procedure ?

Edit : this structure is used to make thousands(can be millions) of calculations

for i:=1 to 1000000 do
begin
  CleartheList(MyList);
  FillRandomListSignals(MyList);
  DotheMath(MyList);
  DotheChart(MyList);
end;
like image 590
Salvador Avatar asked Apr 01 '11 05:04

Salvador


2 Answers

Speed is not your only concern when you generate random data, you actually want that data to be random, you don't want your experiment to be plagued by repetitive data or other pseudo-random generator problems. If you care more about speed then randomness, you can always use a function like this one, that'll be ultra-fast! </joke>.

Here's a post by Barry Kelly on Stack Overflow describing possible problems with the built-in random number generator. Not going to quote it here, go read it for yourself, it's good stuff.

To get to the conclusion, when I needed a PRNG good enough to generate huge amounts of random data I used a Mersenne Twister (wikipedia link), seeded by Delphi's PRNG.

Quote from Wikipedia on Mersene Twister:

For many applications the Mersenne twister is quickly becoming the pseudorandom number generator of choice. The Mersenne Twister is designed with Monte Carlo simulations and other statistical simulations in mind. Researchers primarily want high quality numbers but also benefit from its speed and portability.

And to break all my records on the number of links per post, I used this Delphi implementation.

And my final thought: Unless you're very good with math, stay away from home-made PRNG implementations. Just like hash functions, it's easy to make a mistake and they're very difficult to analyze.


Edit

Did some timing, using the following code. The 10,000,000 records took 1480 ms to generate using the Mersenne Twister. The same code, using Delphi's built in random number generator only took 250 ms, for the same 10M records. Something tells me it's not the random generator that needs optimizing, but something else in the code.

procedure TForm1.Button1Click(Sender: TObject);
var InitArray:array[0..99] of LongInt;

    i, N:Integer;
    TSR: TSignalRecord;

    CStart, CStop: Int64;

begin
  Randomize;
  for i:=0 to 99 do InitArray[i] := Random($effffff);
  InitMTbyArray(InitArray, Length(InitArray));

  CStart := GetTickCount;

  for i:=1 to 10000000 do
  begin
    TSR.signal1 := IRanMT;
    TSR.signal2 := IRanMT;
    TSR.signal3 := IRanMT;
    TSR.signal4 := IRanMT;
    TSR.signal5 := IRanMT;
    TSR.signal6 := IRanMT;

    N := IRanMT;

    TSR.bsignal1 := (N and 1) <> 0;
    TSR.bsignal2 := (N and 2) <> 0;
    TSR.bsignal3 := (N and 4) <> 0;
    TSR.bsignal4 := (N and 8) <> 0;
    TSR.bsignal5 := (N and 16) <> 0;
    TSR.bsignal6 := (N and 32) <> 0;
  end;

  CStop := GetTickCount;

  Caption := IntToStr(CStop - CStart);
end;
like image 182
Cosmin Prund Avatar answered Oct 15 '22 22:10

Cosmin Prund


You may not use the built-in Random() function per field, but globally some code with pipelined-optimized access, working with a random pre-generated array:

var
  crc32tab: array[byte] of cardinal;

procedure InitCrc32Tab;
var i,n: integer;
    crc: cardinal;
begin // this code size is only 105 bytes, generating 1 KB table content
  for i := 0 to 255 do begin
    crc := i;
    for n := 1 to 8 do
      if (crc and 1)<>0 then
        // $edb88320 from polynomial p=(0,1,2,4,5,7,8,10,11,12,16,22,23,26)
        crc := (crc shr 1) xor $edb88320 else
        crc := crc shr 1;
    crc32tab[i] := crc;
  end;
end;

type NativeUInt = cardinal; // before Delphi 2007

procedure RandomData(P: PAnsiChar; Len: integer);
var i: integer;
    seed0, seed1, seed2, seed3: cardinal;
begin
  if Len>=16 then
  begin
    seed0 := Random(maxInt);
    seed1 := seed0*$8088405;
    seed2 := seed1*$8088405;
    seed3 := seed2*$8088405;
    for i := 1 to Len shr 4 do begin  // pipelined loop for 16 bytes at once
      PCardinalArray(P)[0] := crc32tab[byte(seed0)] xor seed0;
      seed0 := seed0 xor NativeUInt(P);
      PCardinalArray(P)[1] := crc32tab[byte(seed1)] xor seed1;
      seed1 := seed1 xor NativeUInt(P);
      PCardinalArray(P)[2] := crc32tab[byte(seed2)] xor seed2;
      seed2 := seed3 xor NativeUInt(P);
      PCardinalArray(P)[3] := crc32tab[byte(seed3)] xor seed3;
      seed3 := seed3 xor NativeUInt(P);
      inc(P,16);
    end;
  end;
  for i := 1 to Len and 15 do begin
    P^ := PAnsiChar(@crc32tab)[NativeUInt(P) and 1023];
    inc(P);
  end;
end;

The above function can be called like this (you must call InitCrc32Tab procedure once in your program):

procedure FillRandomListSignals(var ListSignals: TListSignals);
begin
  RandomData(@ListSignals,sizeof(ListSignals));
end;

It will be faster than using the Random() function, because this function uses two integer multiplications, and is not pipelined at all. The above loop will handle 16 bytes at once, with no multiplication, and multiple operations per CPU clock, because I optimized it to use as much CPU pipelines as possible. We could perhaps play with the seed? variables, or use some optimized asm, but you've got the idea.

Postscriptum:

Since you're filling the list with random data, there is NO NEED of clearing it before. Just waste of time.

like image 29
Arnaud Bouchez Avatar answered Oct 15 '22 22:10

Arnaud Bouchez