Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Junk Generator Speed Problem

Tags:

delphi

I am looking into generating a file ( 750 MB ) full of random bytes. The code I am using in a separate thread looks like this:

I allocated a buffer of that size since writing on the disk consumes more time:

function Generate(buf:Pointer):DWORD;stdcall;
var
i:DWORD;
begin
      for i := 0 to keysize -1 do
            PByte(DWORD(buf) + i)^ := Random(256);
      Result:=0;
end;

The problem is that it takes ages until the process completes. Any ideas for a faster method? I will try to implement it in assembly if there isn't any alternative.

like image 479
opc0de Avatar asked Sep 03 '11 11:09

opc0de


4 Answers

This sounded like a nice practice problem so I went ahead and implemented a parallel solution. It uses slightly over 3 seconds to generate 750 MB file and uses over 90% CPU during its work. (SSD disk helps, too. 3,5 seconds were needed to generate the file on a RAID0 disk pair and 4 seconds to generate a file on a slower 512 GB disk.)

All reused code is available with the OpenBSD license (which is almost "use however you wish"): DSiWin32, GpStuff, GpRandomGen, Otl*.

uses
  DSiWin32,
  GpStuff,
  GpRandomGen,
  OtlCommon,
  OtlCollections,
  OtlParallel;

{$R *.dfm}

procedure FillBuffer(buf: pointer; bufSize: integer; randomGen: TGpRandom);
var
  buf64: PInt64;
  buf8 : PByte;
  i    : integer;
  rnd  : int64;
begin
  buf64 := buf;
  for i := 1 to bufSize div SizeOf(int64) do begin
    buf64^ := randomGen.Rnd64;
    Inc(buf64);
  end;
  rnd := randomGen.Rnd64;
  buf8 := PByte(buf64);
  for i := 1 to bufSize mod SizeOf(int64) do begin
    buf8^ := rnd AND $FF;
    rnd := rnd SHR 8;
    Inc(buf8);
  end;
end; { FillBuffer }

procedure CreateRandomFile(fileSize: integer; output: TStream);
const
  CBlockSize = 1 * 1024 * 1024 {1 MB};
var
  buffer        : TOmniValue;
  lastBufferSize: integer;
  memStr        : TMemoryStream;
  numBuffers    : integer;
  outQueue      : IOmniBlockingCollection;
begin
  outQueue := TOmniBlockingCollection.Create;
  numBuffers := (fileSize - 1) div CBlockSize + 1;
  lastBufferSize := (fileSize - 1) mod CBlockSize + 1;
  Parallel.ForEach(1, numBuffers).NoWait
    .NumTasks(Environment.Process.Affinity.Count)
    .OnStop(
      procedure
      begin
        outQueue.CompleteAdding;
      end)
    .Initialize(
      procedure(var taskState: TOmniValue)
      begin
        taskState := TGpRandom.Create;
      end)
    .Finalize(
      procedure(const taskState: TOmniValue)
      begin
        taskState.AsObject.Free;
      end)
    .Execute(
      procedure(const value: integer; var taskState: TOmniValue)
      var
        buffer      : TMemoryStream;
        bytesToWrite: integer;
      begin
        if value = numBuffers then
          bytesToWrite := lastBufferSize
        else
          bytesToWrite := CBlockSize;
        buffer := TMemoryStream.Create;
        buffer.Size := bytesToWrite;
        FillBuffer(buffer.Memory, bytesToWrite, taskState.AsObject as TGpRandom);
        outQueue.Add(buffer);
      end);
  for buffer in outQueue do begin
    memStr := buffer.AsObject as TMemoryStream;
    output.CopyFrom(memStr, 0);
    FreeAndNil(memStr);
  end;
end;

procedure TForm43.btnRandomClick(Sender: TObject);
var
  fileStr: TFileStream;
  time   : int64;
begin
  time := DSiTimeGetTime64;
  try
    fileStr := TFileStream.Create('e:\0\random.dat', fmCreate);
    try
      CreateRandomFile(750*1024*1024, fileStr);
    finally FreeAndNil(fileStr); end;
  finally Caption := Format('Completed in %d ms', [DSiElapsedTime64(time)]); end;
end;

EDIT: Using ForEach in this case was not really elegant solution so I enhanced OmniThreadLibrary with Parallel.ParallelTask and with better IOmniCounter. Using release 993 (or newer) from the SVN you can solve this multiple-producer-single-consumer problem as follows.

procedure CreateRandomFile(fileSize: integer; output: TStream);
const
  CBlockSize = 1 * 1024 * 1024 {1 MB};
var
  buffer   : TOmniValue;
  memStr   : TMemoryStream;
  outQueue : IOmniBlockingCollection;
  unwritten: IOmniCounter;
begin
  outQueue := TOmniBlockingCollection.Create;
  unwritten := CreateCounter(fileSize);
  Parallel.ParallelTask.NoWait
    .NumTasks(Environment.Process.Affinity.Count)
    .OnStop(Parallel.CompleteQueue(outQueue))
    .Execute(
      procedure
      var
        buffer      : TMemoryStream;
        bytesToWrite: integer;
        randomGen   : TGpRandom;
      begin
        randomGen := TGpRandom.Create;
        try
          while unwritten.Take(CBlockSize, bytesToWrite) do begin
            buffer := TMemoryStream.Create;
            buffer.Size := bytesToWrite;
            FillBuffer(buffer.Memory, bytesToWrite, randomGen);
            outQueue.Add(buffer);
          end;
        finally FreeAndNil(randomGen); end;
      end
    );
  for buffer in outQueue do begin
    memStr := buffer.AsObject as TMemoryStream;
    output.CopyFrom(memStr, 0);
    FreeAndNil(memStr);
  end;
end;

EDIT2: A longer blog post about this problem: Life after 2.1: Parallel data production (Introducing Parallel.Task)

like image 151
gabr Avatar answered Nov 19 '22 10:11

gabr


I don't know about Delphi, but it might be wasting time on the Random(256) call. Why don't you handcode something pseudorandom to the effect of

n = (n * 1103515245 + 12345) & 0xff;

Let n start somewhere and use the recursion, such as this one, to generate next n. It's not really that random, but it should do for creating random files.

EDIT Some food for thought. If you are creating this file in hopes that it will not be easily compressible, then the method outlined above isn't that good, because of the & 0xff part. It's better then to do

n = (n * 1103515245 + 12345) & 0x7fffffff;

as 0x7fffffff = 2147483647 is a prime number. And store the exact larger value of n, and do a n % 256 on assignment. I've had some good runs with this choice of constants, and much prefer it as entropy source to the in-built .NET alternative, as it's many times faster, and you seldom need truly random or better pseudorandom numbers anyway.

like image 6
Gleno Avatar answered Nov 19 '22 09:11

Gleno


The problem is that Random() has limited entropy. And if you generate 750MiB of data, you will get only one of the 2^31 possible different strings (since that is the period of the RNG), not 2^(750*1024*1024*8), which would be the case if generator was perfect. This is a huge disparity.

In short, if you use Random(), your data is not random at all. Anyone could guess all 750MiB of data from a 4MB sample / piece of the file.

You have to do it different way. If you have linux machine, execute this command from your program:

dd if=/dev/urandom of=file.img bs=1M count=750

It finishes in under a half a minute on my old laptop.

like image 4
Rok Kralj Avatar answered Nov 19 '22 11:11

Rok Kralj


As the Random function has no good distribution anyway, you may reduce your code by nearly a factor of four with the following:

function Generate(buf: Pointer): DWORD; stdcall;
var
  i: DWORD;
  p: PInteger;
begin
  p := buf;
  for i := 0 to (keysize div 4) - 1 do begin
    p^ := Random(MaxInt);
    Inc(p);
  end;
  Result := 0;
end;

Update: The above code needs about 650ms on my system, while the original code needs about 3s.

like image 3
Uwe Raabe Avatar answered Nov 19 '22 09:11

Uwe Raabe