Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete a specific line from a 12GB file

Tags:

file

bash

unix

perl

I'm trying to delete a specific line from a 12GB text file.

I do not have the sed -i option available on HP-UX, and other options like saving to a temporary file aren't working because I have only 20GB space available with 12 GB already used by the text file.

Considering the space requirement I'm trying to do this using Perl.

This solution works to delete last 9 lines from a file of 12 GB.

#!/usr/bin/env perl

use strict;
use warnings;

use Tie::File;

tie my @lines, 'Tie::File', 'test.txt' or die "$!\n";
$#lines -= 9;
untie @lines;

I want to modify the above code to delete any specific line number.

like image 883
Vishwanath Dalvi Avatar asked Apr 27 '18 17:04

Vishwanath Dalvi


1 Answers

Tie::File is never the answer.

  • It's insanely slow.
  • It can use up more memory than just slurping the entire file into memory, even if you limit the size of its buffer.

You are encountering both of those problems. You encounter every line of the file, so Tie::File will read the entire file and store the index of every line in memory. This takes 28 bytes per line on a 64-bit build of Perl (not counting any overhead in the memory allocator).


To delete the last 9 lines of the file, you can use the following:

use File::ReadBackwards qw( );

my $qfn = '...';

my $pos;
{
   my $bw = File::ReadBackwards->new($qfn)
      or die("Can't open \"$qfn\": $!\n");

   for (1..9) {
      defined( my $line = $bw->readline() )
         or last;
   }

   $pos = $bw->tell();
}

# Can't use $bw->get_handle because it's a read-only handle.
truncate($qfn, $pos)
   or die("Can't truncate \"$qfn\": $!\n");

To delete an arbitrary line, you can use the following:

my $qfn = '...';

open(my $fh_src, '<:raw', $qfn)
   or die("Can't open \"$qfn\": $!\n");    
open(my $fh_dst, '+<:raw', $qfn)
   or die("Can't open \"$qfn\": $!\n");

while (<$fh_src>) {
   next if $. == 9;  # Or "if /keyword/", or whatever condition you want.

   print($fh_dst $_)
      or die($!);
}

truncate($fh_dst, tell($fh_dst))
   or die($!);    

The following optimized version assumes there's only one line (or block of lines) to remove:

use Fcntl qw( SEEK_CUR SEEK_SET );

use constant BLOCK_SIZE => 4*1024*1024;

my $qfn = 'file';

open(my $fh_src, '<:raw', $qfn)
   or die("Can't open \"$qfn\": $!\n");
open(my $fh_dst, '+<:raw', $qfn)
   or die("Can't open \"$qfn\": $!\n");

my $dst_pos;
while (1) {
   $dst_pos = tell($fh_src);
   defined( my $line = <$fh_src> )
      or do {
         $dst_pos = undef;
         last;
      };

   last if $. == 9;  # Or "if /keyword/", or whatever condition you want.
}

if (defined($dst_pos)) {
   # We're switching from buffered I/O to unbuffered I/O,
   # so we need to move the system file pointer from where the
   # buffered read left off to where we actually finished reading.
   sysseek($fh_src, tell($fh_src), SEEK_SET)
      or die($!);

   sysseek($fh_dst, $dst_pos, SEEK_SET)
      or die($!);

   while (1) {
      my $rv = sysread($fh_src, my $buf, BLOCK_SIZE);
      die($!) if !defined($rv);
      last if !$rv;

      my $written = 0;
      while ($written < length($buf)) {
         my $rv = syswrite($fh_dst, $buf, length($buf)-$written, $written);
         die($!) if !defined($rv);
         $written += $rv;
      }
   }

   # Must use sysseek instead of tell with sysread/syswrite.    
   truncate($fh_dst, sysseek($fh_dst, 0, SEEK_CUR))
      or die($!);
}
like image 200
ikegami Avatar answered Sep 29 '22 11:09

ikegami