I'm trying to delete a specific line from a 12GB text file.
I do not have the sed -i
option available on HP-UX, and other options like saving to a temporary file aren't working because I have only 20GB space available with 12 GB already used by the text file.
Considering the space requirement I'm trying to do this using Perl.
This solution works to delete last 9 lines from a file of 12 GB.
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::File;
tie my @lines, 'Tie::File', 'test.txt' or die "$!\n";
$#lines -= 9;
untie @lines;
I want to modify the above code to delete any specific line number.
Tie::File is never the answer.
You are encountering both of those problems. You encounter every line of the file, so Tie::File will read the entire file and store the index of every line in memory. This takes 28 bytes per line on a 64-bit build of Perl (not counting any overhead in the memory allocator).
To delete the last 9 lines of the file, you can use the following:
use File::ReadBackwards qw( );
my $qfn = '...';
my $pos;
{
my $bw = File::ReadBackwards->new($qfn)
or die("Can't open \"$qfn\": $!\n");
for (1..9) {
defined( my $line = $bw->readline() )
or last;
}
$pos = $bw->tell();
}
# Can't use $bw->get_handle because it's a read-only handle.
truncate($qfn, $pos)
or die("Can't truncate \"$qfn\": $!\n");
To delete an arbitrary line, you can use the following:
my $qfn = '...';
open(my $fh_src, '<:raw', $qfn)
or die("Can't open \"$qfn\": $!\n");
open(my $fh_dst, '+<:raw', $qfn)
or die("Can't open \"$qfn\": $!\n");
while (<$fh_src>) {
next if $. == 9; # Or "if /keyword/", or whatever condition you want.
print($fh_dst $_)
or die($!);
}
truncate($fh_dst, tell($fh_dst))
or die($!);
The following optimized version assumes there's only one line (or block of lines) to remove:
use Fcntl qw( SEEK_CUR SEEK_SET );
use constant BLOCK_SIZE => 4*1024*1024;
my $qfn = 'file';
open(my $fh_src, '<:raw', $qfn)
or die("Can't open \"$qfn\": $!\n");
open(my $fh_dst, '+<:raw', $qfn)
or die("Can't open \"$qfn\": $!\n");
my $dst_pos;
while (1) {
$dst_pos = tell($fh_src);
defined( my $line = <$fh_src> )
or do {
$dst_pos = undef;
last;
};
last if $. == 9; # Or "if /keyword/", or whatever condition you want.
}
if (defined($dst_pos)) {
# We're switching from buffered I/O to unbuffered I/O,
# so we need to move the system file pointer from where the
# buffered read left off to where we actually finished reading.
sysseek($fh_src, tell($fh_src), SEEK_SET)
or die($!);
sysseek($fh_dst, $dst_pos, SEEK_SET)
or die($!);
while (1) {
my $rv = sysread($fh_src, my $buf, BLOCK_SIZE);
die($!) if !defined($rv);
last if !$rv;
my $written = 0;
while ($written < length($buf)) {
my $rv = syswrite($fh_dst, $buf, length($buf)-$written, $written);
die($!) if !defined($rv);
$written += $rv;
}
}
# Must use sysseek instead of tell with sysread/syswrite.
truncate($fh_dst, sysseek($fh_dst, 0, SEEK_CUR))
or die($!);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With