Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summing a column of numbers in a text file using Perl

Tags:

perl

Ok, so I'm very new to Perl. I have a text file and in the file there are 4 columns of data(date, time, size of files, files). I need to create a small script that can open the file and get the average size of the files. I've read so much online, but I still can't figure out how to do it. This is what I have so far, but I'm not sure if I'm even close to doing this correctly.

#!/usr/bin/perl

open FILE, "files.txt";
#@array = File;

while(FILE){
    #chomp;

    ($date, $time, $numbers, $type) = split(/ /,<FILE>);

    $total += $numbers;

}
print"the total is $total\n";

This is how the data looks in the file. These are just a few of them. I need to get the numbers in the third column.

12/02/2002  12:16 AM              86016 a2p.exe
10/10/2004  11:33 AM               393 avgfsznew.pl
11/01/2003  04:42 PM             38124 c2ph.bat
like image 499
user1792846 Avatar asked Nov 01 '12 23:11

user1792846


2 Answers

Your program is reasonably close to working. With these changes it will do exactly what you want

  • Always use use strict and use warnings at the start of your program, and declare all of your variables using my. That will help you by finding many simple errors that you may otherwise overlook

  • Use lexical file handles, the three-parameter form of open, and always check the return status of any open call

  • Declare the $total variable outside the loop. Declaring it inside the loop means it will be created and destroyed each time around the loop and it won't be able to accumulate a total

  • Declare a $count variable in the same way. You will need it to calculate the average

  • Using while (FILE) {...} just tests that FILE is true. You need to read from it instead, so you must use the readline operator like <FILE>

  • You want the default call to split (without any parameters) which will return all the non-space fields in $_ as a list

  • You need to add a variable in the assignment to allow for athe AM or PM field in each line

Here is a modification of your code that works fine

use strict;
use warnings;

open my $fh, '<', "files.txt" or die $!;

my $total = 0;
my $count = 0;

while (<$fh>) {

    my ($date, $time, $ampm, $numbers, $type) = split;

    $total += $numbers;
    $count += 1;

}

print "The total is $total\n";
print "The count is $count\n";
print "The average is ", $total / $count, "\n";

output

The total is 124533
The count is 3
The average is 41511
like image 115
Borodin Avatar answered Sep 20 '22 18:09

Borodin


It's tempting to use Perl's awk-like auto-split option. There are 5 columns; three containing date and time information, then the size and then the name.

The first version of the script that I wrote is also the most verbose:

perl -n -a -e '$total += $F[3]; $num++; END { printf "%12.2f\n", $total / ($num + 0.0); }'

The -a (auto-split) option splits a line up on white space into the array @F. Combined with the -n option (which makes Perl run in a loop that reads the file name arguments in turn, or standard input, without printing each line), the code adds $F[3] (the fourth column, counting from 0) to $total, which is automagically initialized to zero on first use. It also counts the lines in $num. The END block is executed when all the input is read; it uses printf() to format the value. The + 0.0 ensures that the arithmetic is done in floating point, not integer arithmetic. This is very similar to the awk script:

awk '{ total += $4 } END { print total / NR }'

First drafts of programs are seldom optimal — or, at least, I'm not that good a programmer. Revisions help.

Perl was designed, in part, as an awk killer. There is still a program a2p distributed with Perl for converting awk scripts to Perl (and there's also s2p for converting sed scripts to Perl). And Perl does have an automatic (built-in) variable that keeps track of the number of lines read. It has several names. The tersest is $.; the mnemonic name $NR is available if you use English; in the script; so is $INPUT_LINE_NUMBER. So, using $num is not necessary. It also turns out that Perl does a floating point division anyway, so the + 0.0 part was unnecessary. This leads to the next versions:

perl -MEnglish -n -a -e '$total += $F[3]; END { printf "%12.2f\n", $total / $NR; }'

or:

perl -n -a -e '$total += $F[3]; END { printf "%12.2f\n", $total / $.; }'

You can tune the print format to suit your whims and fancies. This is essentially the script I'd use in the long term; it is fairly clear without being long-winded in any way. The script could be split over multiple lines if you desired. It is a simple enough task that the legibility of the one-line is not a problem, IMNSHO. And the beauty of this is that you don't have to futz around with split and arrays and read loops on your own; Perl does most of that for you. (Granted, it does blow up on empty input; that fix is trivial; see below.)

Recommended version

perl -n -a -e '$total += $F[3]; END { printf "%12.2f\n", $total / $. if $.; }'

The if $. tests whether the number of lines read is zero or not; the printf and division are omitted if $. is zero so the script outputs nothing when given no input.


There is a noble (or ignoble) game called 'Code Golf' that was much played in the early days of Stack Overflow, but Code Golf questions are no longer considered good questions. The object of Code Golf is to write a program that does a particular task in as few characters as possible. You can play Code Golf with this and compress it still further if you're not too worried about the format of the output and you're using at least Perl 5.10:

perl -Mv5.10 -n -a -e '$total += $F[3]; END { say $total / $. if $.; }'

And, clearly, there are a lot of unnecessary spaces and letters in there:

perl -Mv5.10 -nae '$t+=$F[3];END{say$t/$.if$.}'

That is not, however, as clear as the recommended version.

like image 23
Jonathan Leffler Avatar answered Sep 20 '22 18:09

Jonathan Leffler