How this can be done in more perl way

Tags:

perl

I'm a novice in Perl and for one of my homework I came up with a solution like this :

#wordcount.pl FILE 
    # 

    #if no filename is given, print help and exit 
    if (length($ARGV[0]) < 1) 
    { 
           print "Usage is : words.pl word filename\n"; 
           exit; 
    } 

   my $file = $ARGV[0];          #filename given in commandline 

   open(FILE, $file);            #open the mentioned filename 
   while(<FILE>)                 #continue reading until the file ends 
    { 
           chomp; 
           tr/A-Z/a-z/;          #convert all upper case words to lower case 
           tr/.,:;!?"(){}//d;            #remove some common punctuation symbols 
           #We are creating a hash with the word as the key.  
           #Each time a word is encountered, its hash is incremented by 1. 
           #If the count for a word is 1, it is a new distinct word. 
           #We keep track of the number of words parsed so far. 
           #We also keep track of the no. of words of a particular length.  

          foreach $wd (split) 
          { 
                $count{$wd}++; 
                if ($count{$wd} == 1) 
                 { 
                       $dcount++; 
                 } 
                $wcount++; 
                $lcount{length($wd)}++; 
          } 
   } 

   #To print the distinct words and their frequency,  
   #we iterate over the hash containing the words and their count. 
   print "\nThe words and their frequency in the text is:\n"; 
   foreach $w (sort keys%count) 
   { 
         print "$w : $count{$w}\n"; 
   } 

   #For the word length and frequency we use the word length hash 
   print "The word length and frequency in the given text is:\n"; 
   foreach $w (sort keys%lcount) 
   { 
         print "$w : $lcount{$w}\n"; 
   } 

   print "There are $wcount words in the file.\n"; 
   print "There are $dcount distinct words in the file.\n"; 

   $ttratio = ($dcount/$wcount)*100;       #Calculating the type-token ratio. 

   print "The type-token ratio of the file is $ttratio.\n";

I have included the comment to mention what it does. Actually I have to find the word count from the given text file. The output of the above program will look like :

The words and their frequency in the text is: 
1949 : 1
a : 1
adopt : 1
all : 2
among : 1
and : 8
assembly : 1
assuring : 1
belief : 1
citizens : 1
constituent : 1
constitute : 1
.
.
.
The word length and frequency in the given text is:
1 : 1
10 : 5
11 : 2
12 : 2
2 : 15
3 : 18
There are 85 words in the file. 
There are 61 distinct words in the file. 
The type-token ratio of the file is 71.7647058823529.

Even though with the help of Google I can able to find out the solution for my homework. But however I think that there will be a smaller and concise code using the real power of Perl. Can anyone give me a solution in Perl with much less lines of code?

422

asked Oct 09 '11 12:10

sriram

1 Answers

Here are several suggestions:

Include use strict and use warnings in your Perl scripts.
Your argument validation isn't testing what it should be testing: (1) whether there is exactly 1 item in @ARGV, and (2) whether that item is a valid file name.
Although there are exceptions to every rule, it's generally good practice to assign the return from <> to a named variable, rather than relying on $_. This is particularly true if the code inside the loop might need to use one of Perl's constructs that also relies on $_ (for example, map, grep, or post-fix for loops)
```
while (my $line = <>){
    ...
}
```
Perl provides a built-in function (lc) to lowercase strings.
You are performing unnecessary computations inside the line reading loop. If you simply build up a tally of words, you'll have all of the information you need. Also note that Perl offers a one-liner form for most of its control structures (for, while, if, etc.), as illustrated below.
```
while (my $line = <>){
    ...
    $words{$_} ++ for split /\s+/, $line;
}
```
You can then use the word tallies to compute the other information you need. For example, the number of unique words is simply the number of keys in the hash and the total number of words is the sum of the hash values.

The distribution of word lengths can be computed like this:

my %lengths;
$lengths{length $_} += $words{$_} for keys %words;

188

answered Sep 25 '22 14:09

FMc

Related questions
                            
                                Is there any pure Perl module to create images and to place text in the image?
                            
                                How should I determine next daylight saving time (DST) transition for a timezone in Perl?
                            
                                Perl using a sub call as argument to another sub - unexpected context
                            
                                Perl script usable as a program and as a module
                            
                                Geocoding....did I do something wrong?
                            
                                Why does perl report an incorrect line number for this warning regarding an uninitialized value used in an elsif?
                            
                                Perl split pattern
                            
                                "Inline C"-question
                            
                                How can I declare/use static members in Moose?
                            
                                How to make an installable Perl program with makefile/autoconf?
                            
                                Linux perl module management
                            
                                Why is a 'use' statement executed first in a BEGIN block?
                            
                                How big can the argument to Perl's rand be?
                            
                                Quote - capture - question
                            
                                How do you tell if a pipe opened process has terminated?
                            
                                How do I make Perl::Critic show the offending policy in its output?
                            
                                Specify multiple classes in HTML::Element's look_down routine Perl?
                            
                                Whats does the Perl error "Can't locate Net/SSH/Perl.pm" mean?
                            
                                Does Perl have an associative array type that can have any type of keys?
                            
                                Perl: Constructing an array of objects

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With