How to write this better in perl

Question

Given a large input file that looks like this:

02/26/2012 08:54:38 Error:java.sql.Exception
02/26/2012 08:54:48 Error:java.sql.Exception
02/26/2012 08:56:05 Error:java.sql.Exception
02/26/2012 08:57:21 Error:java.sql.Exception
02/26/2012 08:59:29 Error:java.sql.Exception
02/26/2012 09:01:14 Error:java.sql.Exception
02/26/2012 09:08:48 Error:java.sql.Exception
02/26/2012 09:10:41 Error:java.sql.Exception

I am trying to find out the count of errors per hour; that is, I am looking for an output file that looks like this:

02/26/2012 08 -> 5
02/26/2012 09 -> 3

Here is a script that is working for me:

#!/bin/perl
open(MYFILE, 'tata2');
my %table;
while (<MYFILE>) {
     chomp;
     $dtkey = substr $_, 0, 13;
     $table{$dtkey}++;
}
close(MYFILE); 
for my $key (keys %table) {
    print "$key -> $table{$key}
";
}

But given Perl’s features, I am pretty sure this can be done in fewer lines. I’d greatly appreciate if you can provide some examples. I hope it will be useful for those who want to reduce lines of code written to achieve something.

Eric Strom · Accepted Answer

What you have is already fairly short. You can improve things a bit by using lexical file handles and checking the return value of open.

Here is a rewrite using some of Perl's other syntactic features:

open my $fh, '<', 'filename' or die $!;
my %table;

while (<$fh>) {
    $table{$1}++ if /([^:]+)/ # regex is a bit shorter than the substr
}

print "$_ -> $table{$_}
" for keys %table;  # statement modifier form

Or if you really want it short, how about a one liner:

perl -lnE '$t{$1}++ if /([^:]+)/; END {say "$_ -> $t{$_}" for keys %t}' infile

Greg Bacon · Answer

You can make effective use of named capture groups, a new feature since version 5.10, to have your pattern express your intent better and produce correctly sorted output.

You can dispense with numbers altogether and create named capture groups. The notation is (?<name>...) to declare and \g{name} to reference. (To be compatible with .NET regular expressions, \g{name} may also be written as \k{name}, \k<name> or \k'name'.) name must not begin with a number, nor contain hyphens. When different groups within the same pattern have the same name, any reference to that name assumes the leftmost defined group. Named groups count in absolute and relative numbering, and so can also be referred to by those numbers. (It's possible to do things with named capture groups that would otherwise require (??{}).)

Capture group contents are dynamically scoped and available to you outside the pattern until the end of the enclosing block or until the next successful match, whichever comes first. (See Compound Statements in perlsyn.) You can refer to them by absolute number (using $1 instead of \g1 , etc); or by name via the %+ hash, using $+{name}.

For each line of input, look for a match but permute the components to YYYY/MM/DD HH order for easy sorting.

#! /usr/bin/env perl

use strict;
use warnings;

use 5.10.0;  # named capture buffers

*ARGV = *DATA;  # for demo only; remove for real use

my %hour_errors;
while (<>) {
  $hour_errors{"$+{y}/$+{m}/$+{d} $+{h}"}++
    if m!^ (?<m> \d+) / (?<d> \d+) / (?<y> \d+)  \s+  (?<h> \d+) :!x;
}

print "$_ -> $hour_errors{$_}
" for sort keys %hour_errors;

__DATA__
02/26/2012 08:54:38 Error:java.sql.Exception
02/26/2012 08:54:48 Error:java.sql.Exception
02/26/2012 08:56:05 Error:java.sql.Exception
02/26/2012 08:57:21 Error:java.sql.Exception
02/26/2012 08:59:29 Error:java.sql.Exception
02/26/2012 09:01:14 Error:java.sql.Exception
02/26/2012 09:08:48 Error:java.sql.Exception
02/26/2012 09:10:41 Error:java.sql.Exception

Output:

2012/02/26 08 -> 5
2012/02/26 09 -> 3

How to write this better in perl

Tags:

optimization

hash

coding-style

perl

ring bearer

2 Answers

Eric Strom

Greg Bacon

Recent Activity

Donate For Us

How to write this better in perl

Tags:

optimization

hash

coding-style

perl

ring bearer

2 Answers

Eric Strom

Greg Bacon

Related questions

Recent Activity

Donate For Us