Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File::Slurp faster to write a file to perl

I have a perl script where I am writing out a very large log file. Currently I write out my file in the 'traditional' Perl way of doing it:

open FILE, ">", 'log.txt';
print FILE $line;
.....
close FILE;

I've heard a lot of good things about File::Slurp when reading in files, and how it can vastly improve runtimes. My question is, would using File::Slurp make writing out my log file any faster? I ask because writing out a file in perl seems pretty simple as it is, I don't know how File::Slurp could really optimize it anymore.

like image 848
srchulo Avatar asked Sep 03 '12 05:09

srchulo


People also ask

How do I slurp a file in Perl?

There are several ways in Perl to read an entire file into a string, (a procedure also known as “slurping”). If you have access to CPAN, you can use the File::Slurp module: use File::Slurp; my $file_content = read_file('text_document. txt');

How do I read an array from a file in Perl?

readline in LIST context. In this case, after opening the file we read from the $fh filehandle into an array variable: my @rows = <$fh>;. In this case Perl will read in the content of the whole file in one step. Each row in the file will be one of the elements of the array.


1 Answers

The File::Slurp utilities may, under certain circumstances, be fractionally faster overall than the equivalent streamed implementation, but file I/O is so very much slower than anything based solely on memory and CPU speed that it is almost always the limiting resource.

I have never heard any claims that File::Slurp can vastly improve runtimes and would appreciate seeing a reference to that effect. The only way I could see it being a more efficient solution is if the program requires random access to the files or has to read it multiple times. Because the data is all in memory at once there is no overhead to accessing any of the data, but in this case my preference would be for Tie::File which makes it appear as if the data is all available simultaneously with little speed impact and far less memory overhead

In fact it may well be that a call to read_file makes the process seem much slower to the user. If the file is significantly large then the time taken to read all of it and split it into lines may amount to a distinct delay before processing can start, whereas openeing a file and reading the first line will usually appear to be instantaneous

The same applies at the end of the program. A call to write_file, which combines the data into disk blocks and pages it out to the file, will take substantially longer than simply closing the file

In general the traditional streaming output method is preferable. It has little or no speed impact and avoids data loss by saving the data incrementally instead of waiting until a vast swathe of data has been accumulated in memory before discovering that it cannot be written to disk for one reason or another

My advice is that you reserve using File::Slurp for when you have small files to which random access could significantly simplify the program code. Even then there is nothing wrong with

my @data = do {
  open my $fh, '<', 'my_file' or die $!;
  <$fh>;
};

for input, or

open my $fh, '>', 'out_file' or die $!;
print { $fh } for @data;

for output. Particularly in your case, where you are dealing with a very large log file I think there is no question that you should stick to streamed output methods

like image 86
Borodin Avatar answered Sep 28 '22 02:09

Borodin