Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write this better in perl

Given a large input file that looks like this:

02/26/2012 08:54:38 Error:java.sql.Exception
02/26/2012 08:54:48 Error:java.sql.Exception
02/26/2012 08:56:05 Error:java.sql.Exception
02/26/2012 08:57:21 Error:java.sql.Exception
02/26/2012 08:59:29 Error:java.sql.Exception
02/26/2012 09:01:14 Error:java.sql.Exception
02/26/2012 09:08:48 Error:java.sql.Exception
02/26/2012 09:10:41 Error:java.sql.Exception

I am trying to find out the count of errors per hour; that is, I am looking for an output file that looks like this:

02/26/2012 08 -> 5
02/26/2012 09 -> 3

Here is a script that is working for me:

#!/bin/perl
open(MYFILE, 'tata2');
my %table;
while (<MYFILE>) {
     chomp;
     $dtkey = substr $_, 0, 13;
     $table{$dtkey}++;
}
close(MYFILE); 
for my $key (keys %table) {
    print "$key -> $table{$key}\n";
}

But given Perl’s features, I am pretty sure this can be done in fewer lines. I’d greatly appreciate if you can provide some examples. I hope it will be useful for those who want to reduce lines of code written to achieve something.

like image 406
ring bearer Avatar asked Mar 02 '12 16:03

ring bearer


2 Answers

What you have is already fairly short. You can improve things a bit by using lexical file handles and checking the return value of open.

Here is a rewrite using some of Perl's other syntactic features:

open my $fh, '<', 'filename' or die $!;
my %table;

while (<$fh>) {
    $table{$1}++ if /([^:]+)/ # regex is a bit shorter than the substr
}

print "$_ -> $table{$_}\n" for keys %table;  # statement modifier form

Or if you really want it short, how about a one liner:

perl -lnE '$t{$1}++ if /([^:]+)/; END {say "$_ -> $t{$_}" for keys %t}' infile
like image 167
Eric Strom Avatar answered Oct 26 '22 05:10

Eric Strom


You can make effective use of named capture groups, a new feature since version 5.10, to have your pattern express your intent better and produce correctly sorted output.

You can dispense with numbers altogether and create named capture groups. The notation is (?<name>...) to declare and \g{name} to reference. (To be compatible with .NET regular expressions, \g{name} may also be written as \k{name}, \k<name> or \k'name'.) name must not begin with a number, nor contain hyphens. When different groups within the same pattern have the same name, any reference to that name assumes the leftmost defined group. Named groups count in absolute and relative numbering, and so can also be referred to by those numbers. (It's possible to do things with named capture groups that would otherwise require (??{}).)

Capture group contents are dynamically scoped and available to you outside the pattern until the end of the enclosing block or until the next successful match, whichever comes first. (See Compound Statements in perlsyn.) You can refer to them by absolute number (using $1 instead of \g1 , etc); or by name via the %+ hash, using $+{name}.

For each line of input, look for a match but permute the components to YYYY/MM/DD HH order for easy sorting.

#! /usr/bin/env perl

use strict;
use warnings;

use 5.10.0;  # named capture buffers

*ARGV = *DATA;  # for demo only; remove for real use

my %hour_errors;
while (<>) {
  $hour_errors{"$+{y}/$+{m}/$+{d} $+{h}"}++
    if m!^ (?<m> \d+) / (?<d> \d+) / (?<y> \d+)  \s+  (?<h> \d+) :!x;
}

print "$_ -> $hour_errors{$_}\n" for sort keys %hour_errors;

__DATA__
02/26/2012 08:54:38 Error:java.sql.Exception
02/26/2012 08:54:48 Error:java.sql.Exception
02/26/2012 08:56:05 Error:java.sql.Exception
02/26/2012 08:57:21 Error:java.sql.Exception
02/26/2012 08:59:29 Error:java.sql.Exception
02/26/2012 09:01:14 Error:java.sql.Exception
02/26/2012 09:08:48 Error:java.sql.Exception
02/26/2012 09:10:41 Error:java.sql.Exception

Output:

2012/02/26 08 -> 5
2012/02/26 09 -> 3
like image 26
Greg Bacon Avatar answered Oct 26 '22 05:10

Greg Bacon