Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl: write speed mystery?

How can the output rate be higher than hard disk write rate?

Update 1: I have changed the following:

  1. Turned off antivirus. No change.

  2. Inserted new physical disk and used the first partition for the test. (The disk for the initial test was on the last partition, separate from the system partition, but on the same physical disk.). Result: there is the same cyclic pattern, but the system is no longer unresponsive during the test. The write speed is somewhat higher (could be due to using the first partition and/or no longer interference with the system partition). Preliminary conclusion: there was some kind of interference from the system partition.

  3. Installed 64 bit Perl. The cycles are gone and everything is stable on a 2 second timescale: 55% CPU on the single core, write speed about 65 MB/s.

  4. Tried on the original drive with 64 bit Perl. Result: somewhere in-between. Cycles of 8 seconds, CPU 20-50%, 35 - 65 MB/sec (instead of deep cycles of 0-100%, 0 - 120 MB/sec). The system is only mildly unresponsive. Write speed is 50 MB/sec. This supports the interference theory.

  5. Flushing in the Perl script. Not tried yet.


OK, I got past the first hurdle. I have written a Perl script that can generate a very large text file (e.g. 20 GB) and is essentially just a number of:

print NUMBERS_OUTFILE $line;

where $line is a long string with a "\n" at the end.

When the Perl script starts the write rate is about 120 MB/s (consistent between what is computed by the script, Process Explorer and "IO Write Bytes/sec" for process Perl in Performance Monitor.) and 100% CPU on the single core it is running on. This rate is, I believe, higher than write speed of the hard disk.

Then after some time (e.g. 20 seconds and 2.7 GB written) the whole system becomes very unresponsive and CPU drops to 0%. This last for e.g. 30 seconds. The average write speed over these two phases is consistent with the write speed of the hard disk. The times and sizes mentioned in this paragraph varies a lot from run to run. The range 1 GB to 4.3 GB for the first phase has been observed so far. Here is a transcript for the run with 4.3 GB.

There are several of these cycles for a 9.2 GB text file generated in the test:

Enter image description here

What is going on?


Full Perl script and BAT driver script (HTML formatted with the pre tag). If the two environment variables MBSIZE and OUTFILE are setup then the Perl script should be able to run unchanged on other platforms than Windows.

Platform: Perl 5.10.0 from ActiveState; (initially 32 bit, later 64 bit); build 1004. Windows XP x64 SP2, no page file, 8 GB RAM, AMD quad core CPU, 500 GB Green Caviar hard disks (write speed 85 MB/s?).

like image 644
Peter Mortensen Avatar asked Sep 07 '09 21:09

Peter Mortensen


1 Answers

I am with everyone else who is saying that the problem is buffers filling and then emptying. Try turning on autoflush to avoid having a buffer (in Perl):

#!/usr/bin/perl

use strict;
use warnings;

use IO::Handle;

my $filename = "output.txt";

open my $numbers_outfile, ">", $filename
    or die "could not open $filename: $!";

$numbers_outfile->autoflush(1);

#each time through the loop should be 1 gig
for (1 .. 20) {
    #each time though the loop should be 1 meg
    for (1 .. 1024) {
        #print 1 meg of Zs
        print {$numbers_outfile} "Z" x (1024*1024)
    }
}

Buffers can be good if you are going to print a little, do so work, print a litte, do some work, etc. But if you are just going to be blasting data onto disk, they can cause odd behavior. You may also need to disable any write caching your filesystem is doing.

like image 61
Chas. Owens Avatar answered Sep 25 '22 20:09

Chas. Owens