Is there any fast and memory efficient way to read specific lines of large file, without loading it to memory?
I wrote a perl script, that runs many forks and I would like them to read specific lines from a file.
At the moment Im using an external command:
sub getFileLine {
my ( $filePath, $lineWanted ) = @_;
$SIG{PIPE} = '_IGNORE_';
open( my $fh, '-|:utf8', "tail -q -n +$lineWanted \"$filePath\" | head -n 1" );
my $line = <$fh>;
close $fh;
chomp( $line );
return $line;
}
Its fast and it works - but maybe there's a more "Perl-ish" way, as fast and as memory efficient as this one?
As you know, creating a fork process in Perl duplicates the main process memory - so if the main process is using 10MB, the fork will use at least that much.
My goal is to keep fork process (so main process until running forks also) memory use as low as possible. Thats why I dont want to load the whole file into memory.
With the -p switch, Perl wraps a while loop around the code you specify with -e, and -i turns on in-place editing. The current line is in $. With -p, Perl automatically prints the value of $ at the end of the loop.
Before you go further, it's important to understand how fork
works. When you fork
a process, the OS uses copy-on-write semantics to share the bulk of the parent and child processes' memory; only the amount of memory that differs between the parent and child need to be separately allocated.
For reading a single line of a file in Perl, here's a simple way:
open my $fh, '<', $filePath or die "$filePath: $!";
my $line;
while( <$fh> ) {
if( $. == $lineWanted ) {
$line = $_;
last;
}
}
This uses the special $.
variable which holds the line number of the current filehandle.
Take a look at Tie::File core module.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With