Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I trim log files using Perl?

I have recently come up with a situation where I need to trim some rather large log files once they grow beyond a certain size. Everything but the last 1000 lines in each file is disposed of, the job is run every half hour by cron. My solution was to simply run through the list of files, check size and trim if necessary.

for $file (@fileList) {
  if ( ((-s $file) / (1024 * 1024)) > $CSize) {
      open FH, "$file" or die "Cannot open ${file}: $!\n";
      $lineNo = 0;
      my @tLines;

      while(<FH>) {
        push @tLines, $_;
        shift @tLines if ++$lineNo < CLLimit;
      }
      close FH;

      open FH, ">$file" or die "Cannot write to ${file}: $!\n";
      print FH @tLines;
      close FH;
}

This works in the current form but there is a lot of overhead for large log files (especially the ones with 100_000+ lines) because of the need to read in each line and shift if necessary.

Is there any way I could read in just a portion of the file, e.g. in this instance I want to be able to access only the last "CLLimit" lines. Since the script is being deployed on a system that has seen better days (think Celeron 700MHz with 64MB RAM) I am looking for a quicker alternative using Perl.

like image 464
aks Avatar asked Dec 06 '22 07:12

aks


2 Answers

I realize you're wanting to use Perl, but if this is a UNIX system, why not use the "tail" utility to do the trimming? You could do this in BASH with a very simple script:

if [ `stat -f "%z" "$file"` -gt "$MAX_FILE_SIZE" ]; then
    tail -1000 $file > $file.tmp
    #copy and then rm to avoid inode problems
    cp $file.tmp $file
    rm $file.tmp
fi

That being said, you would probably find this post very helpful if you're set on using Perl for this.

like image 119
Jay Avatar answered Dec 22 '22 00:12

Jay


Estimate the average length of a line in the log - call it N bytes.

Seek backwards from the end of the file by 1000 * 1.10 * N (10% margin for error in the factor 1.10). Read forward from there, keeping just the most recent 1000 lines.


The question was asked - which function or module?

Built-in function seek looks to me like the tool to use?

like image 34
Jonathan Leffler Avatar answered Dec 21 '22 22:12

Jonathan Leffler