I am looking into generating a file ( 750 MB ) full of random bytes. The code I am using in a separate thread looks like this:
I allocated a buffer of that size since writing on the disk consumes more time:
function Generate(buf:Pointer):DWORD;stdcall;
var
i:DWORD;
begin
for i := 0 to keysize -1 do
PByte(DWORD(buf) + i)^ := Random(256);
Result:=0;
end;
The problem is that it takes ages until the process completes. Any ideas for a faster method? I will try to implement it in assembly if there isn't any alternative.
This sounded like a nice practice problem so I went ahead and implemented a parallel solution. It uses slightly over 3 seconds to generate 750 MB file and uses over 90% CPU during its work. (SSD disk helps, too. 3,5 seconds were needed to generate the file on a RAID0 disk pair and 4 seconds to generate a file on a slower 512 GB disk.)
All reused code is available with the OpenBSD license (which is almost "use however you wish"): DSiWin32, GpStuff, GpRandomGen, Otl*.
uses
DSiWin32,
GpStuff,
GpRandomGen,
OtlCommon,
OtlCollections,
OtlParallel;
{$R *.dfm}
procedure FillBuffer(buf: pointer; bufSize: integer; randomGen: TGpRandom);
var
buf64: PInt64;
buf8 : PByte;
i : integer;
rnd : int64;
begin
buf64 := buf;
for i := 1 to bufSize div SizeOf(int64) do begin
buf64^ := randomGen.Rnd64;
Inc(buf64);
end;
rnd := randomGen.Rnd64;
buf8 := PByte(buf64);
for i := 1 to bufSize mod SizeOf(int64) do begin
buf8^ := rnd AND $FF;
rnd := rnd SHR 8;
Inc(buf8);
end;
end; { FillBuffer }
procedure CreateRandomFile(fileSize: integer; output: TStream);
const
CBlockSize = 1 * 1024 * 1024 {1 MB};
var
buffer : TOmniValue;
lastBufferSize: integer;
memStr : TMemoryStream;
numBuffers : integer;
outQueue : IOmniBlockingCollection;
begin
outQueue := TOmniBlockingCollection.Create;
numBuffers := (fileSize - 1) div CBlockSize + 1;
lastBufferSize := (fileSize - 1) mod CBlockSize + 1;
Parallel.ForEach(1, numBuffers).NoWait
.NumTasks(Environment.Process.Affinity.Count)
.OnStop(
procedure
begin
outQueue.CompleteAdding;
end)
.Initialize(
procedure(var taskState: TOmniValue)
begin
taskState := TGpRandom.Create;
end)
.Finalize(
procedure(const taskState: TOmniValue)
begin
taskState.AsObject.Free;
end)
.Execute(
procedure(const value: integer; var taskState: TOmniValue)
var
buffer : TMemoryStream;
bytesToWrite: integer;
begin
if value = numBuffers then
bytesToWrite := lastBufferSize
else
bytesToWrite := CBlockSize;
buffer := TMemoryStream.Create;
buffer.Size := bytesToWrite;
FillBuffer(buffer.Memory, bytesToWrite, taskState.AsObject as TGpRandom);
outQueue.Add(buffer);
end);
for buffer in outQueue do begin
memStr := buffer.AsObject as TMemoryStream;
output.CopyFrom(memStr, 0);
FreeAndNil(memStr);
end;
end;
procedure TForm43.btnRandomClick(Sender: TObject);
var
fileStr: TFileStream;
time : int64;
begin
time := DSiTimeGetTime64;
try
fileStr := TFileStream.Create('e:\0\random.dat', fmCreate);
try
CreateRandomFile(750*1024*1024, fileStr);
finally FreeAndNil(fileStr); end;
finally Caption := Format('Completed in %d ms', [DSiElapsedTime64(time)]); end;
end;
EDIT: Using ForEach in this case was not really elegant solution so I enhanced OmniThreadLibrary with Parallel.ParallelTask and with better IOmniCounter. Using release 993 (or newer) from the SVN you can solve this multiple-producer-single-consumer problem as follows.
procedure CreateRandomFile(fileSize: integer; output: TStream);
const
CBlockSize = 1 * 1024 * 1024 {1 MB};
var
buffer : TOmniValue;
memStr : TMemoryStream;
outQueue : IOmniBlockingCollection;
unwritten: IOmniCounter;
begin
outQueue := TOmniBlockingCollection.Create;
unwritten := CreateCounter(fileSize);
Parallel.ParallelTask.NoWait
.NumTasks(Environment.Process.Affinity.Count)
.OnStop(Parallel.CompleteQueue(outQueue))
.Execute(
procedure
var
buffer : TMemoryStream;
bytesToWrite: integer;
randomGen : TGpRandom;
begin
randomGen := TGpRandom.Create;
try
while unwritten.Take(CBlockSize, bytesToWrite) do begin
buffer := TMemoryStream.Create;
buffer.Size := bytesToWrite;
FillBuffer(buffer.Memory, bytesToWrite, randomGen);
outQueue.Add(buffer);
end;
finally FreeAndNil(randomGen); end;
end
);
for buffer in outQueue do begin
memStr := buffer.AsObject as TMemoryStream;
output.CopyFrom(memStr, 0);
FreeAndNil(memStr);
end;
end;
EDIT2: A longer blog post about this problem: Life after 2.1: Parallel data production (Introducing Parallel.Task)
I don't know about Delphi, but it might be wasting time on the Random(256)
call. Why don't you handcode something pseudorandom to the effect of
n = (n * 1103515245 + 12345) & 0xff;
Let n
start somewhere and use the recursion, such as this one, to generate next n
. It's not really that random, but it should do for creating random files.
EDIT
Some food for thought. If you are creating this file in hopes that it will not be easily compressible, then the method outlined above isn't that good, because of the & 0xff
part. It's better then to do
n = (n * 1103515245 + 12345) & 0x7fffffff;
as 0x7fffffff = 2147483647
is a prime number. And store the exact larger value of n
, and do a n % 256
on assignment. I've had some good runs with this choice of constants, and much prefer it as entropy source to the in-built .NET alternative, as it's many times faster, and you seldom need truly random or better pseudorandom numbers anyway.
The problem is that Random()
has limited entropy. And if you generate 750MiB of data, you will get only one of the 2^31
possible different strings (since that is the period of the RNG), not 2^(750*1024*1024*8)
, which would be the case if generator was perfect. This is a huge disparity.
In short, if you use Random(), your data is not random at all. Anyone could guess all 750MiB of data from a 4MB sample / piece of the file.
You have to do it different way. If you have linux machine, execute this command from your program:
dd if=/dev/urandom of=file.img bs=1M count=750
It finishes in under a half a minute on my old laptop.
As the Random function has no good distribution anyway, you may reduce your code by nearly a factor of four with the following:
function Generate(buf: Pointer): DWORD; stdcall;
var
i: DWORD;
p: PInteger;
begin
p := buf;
for i := 0 to (keysize div 4) - 1 do begin
p^ := Random(MaxInt);
Inc(p);
end;
Result := 0;
end;
Update: The above code needs about 650ms on my system, while the original code needs about 3s.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With