I have recently come up with a situation where I need to trim some rather large log files once they grow beyond a certain size. Everything but the last 1000 lines in each file is disposed of, the job is run every half hour by cron. My solution was to simply run through the list of files, check size and trim if necessary.
for $file (@fileList) {
if ( ((-s $file) / (1024 * 1024)) > $CSize) {
open FH, "$file" or die "Cannot open ${file}: $!\n";
$lineNo = 0;
my @tLines;
while(<FH>) {
push @tLines, $_;
shift @tLines if ++$lineNo < CLLimit;
}
close FH;
open FH, ">$file" or die "Cannot write to ${file}: $!\n";
print FH @tLines;
close FH;
}
This works in the current form but there is a lot of overhead for large log files (especially the ones with 100_000+ lines) because of the need to read in each line and shift if necessary.
Is there any way I could read in just a portion of the file, e.g. in this instance I want to be able to access only the last "CLLimit" lines. Since the script is being deployed on a system that has seen better days (think Celeron 700MHz with 64MB RAM) I am looking for a quicker alternative using Perl.
I realize you're wanting to use Perl, but if this is a UNIX system, why not use the "tail" utility to do the trimming? You could do this in BASH with a very simple script:
if [ `stat -f "%z" "$file"` -gt "$MAX_FILE_SIZE" ]; then
tail -1000 $file > $file.tmp
#copy and then rm to avoid inode problems
cp $file.tmp $file
rm $file.tmp
fi
That being said, you would probably find this post very helpful if you're set on using Perl for this.
Estimate the average length of a line in the log - call it N bytes.
Seek backwards from the end of the file by 1000 * 1.10 * N (10% margin for error in the factor 1.10). Read forward from there, keeping just the most recent 1000 lines.
The question was asked - which function or module?
Built-in function seek looks to me like the tool to use?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With