Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search for lines in a file between two timestamps using Perl?

In Perl I am trying to read a log file and will print only the lines that have a timestamp between two specific times. The time format is hh:mm:ss and this is always the third value on each log. For example, I would be searching for lines that would fall between 12:52:33 to 12:59:33

I am new to Perl and have no idea which route to take to even begin to program this. I am pretty sure this would use some type of regex, but for the life of me I cannot even begin to fathom what that would be. Could someone please assist me with this.

Also, to make this more difficult I have to do this with the core Perl modules because my company will not allow me to use any other modules until they have been tested and verified there will be no ill effects on any of the systems the script may interact with.

like image 933
Matt Pascoe Avatar asked Jun 28 '10 18:06

Matt Pascoe


2 Answers

In pseudocode, you'd do something like this:

  • read in the file line by line:
    • parse the timestamp for this line.
    • if it's less than the start time, skip to the next line.
    • if it's greater than the end time, skip to the next line!
    • else: this is a line you want: print it out.

This may be too advanced for your needs, but the flip-flop operator .. immediately comes to mind as something that would be useful here.

For reading in a file from stdin, this is the conventional pattern:

while (my $line = <>)
{
     # do stuff...
}

Parsing a line into fields can be done easily with split (see perldoc -f split). You will probably need to split the line by tabs or spaces, depending on the format.

Once you've got the particular field (containing the timestamp), you can examine it using a customized regexp. Read about those at perldoc perlre.

Here's something which might get you closer:

use strict;
use warnings;

use POSIX 'mktime';
my $starttime = mktime(33, 52, 12);
my $endtime = mktime(33, 59, 12);

while (my $line = <>)
{
    # split into fields using whitespace as the delimiter
    my @fields = split(/\s+/, $line);

    # the timestamp is the 3rd field
    my $timestamp = $fields[2];

    my ($hour, $min, $sec) = split(':', $timestamp);
    my $time = mktime($sec, $min, $hour);

    next unless ($time < $starttime) .. ($time > $endtime);
    print $line;
}
like image 148
Ether Avatar answered Sep 24 '22 19:09

Ether


If the start and end times are known, a Perl one-liner with a flip-flop operator is what you need:

perl -ne 'print if /12:52:33/../12:59:33/' logFile

If there is some underlying logic needed in order for you to determine the start and end times, then 'unroll' the one-liner to a formal script:

use strict;
use warnings;

open my $log, '<', 'logFile';

my $startTime = get_start_time();  # Sets $startTime in hh:mm:ss format
my $endTime = get_end_time();      # Sets $endTime in hh:mm:ss format

while ( <$log> ) {

    print if /$startTime/../$endTime/;
}

As noted by Ether's comment, this will fail if the exact time is not present. If this is a possibility, one might implement the following logic instead:

use strict;
use warnings;
use autosplit;

open my $log, '<', 'logFile';

my $startTime = get_start_time();  # Sets $startTime in hh:mm:ss format
my $endTime = get_end_time();      # Sets $endTime in hh:mm:ss format

while ( <$log> ) {

    my $time = (split /,/, $_)[2];      # Assuming fields are comma-separated
                                        # and timelog is 3rd field

    last  if $time gt $endTime;         # Stop when stop time reached
    print if $time ge $startTime;
}
like image 24
Zaid Avatar answered Sep 22 '22 19:09

Zaid