I'm a novice in Perl and for one of my homework I came up with a solution like this :
#wordcount.pl FILE
#
#if no filename is given, print help and exit
if (length($ARGV[0]) < 1)
{
print "Usage is : words.pl word filename\n";
exit;
}
my $file = $ARGV[0]; #filename given in commandline
open(FILE, $file); #open the mentioned filename
while(<FILE>) #continue reading until the file ends
{
chomp;
tr/A-Z/a-z/; #convert all upper case words to lower case
tr/.,:;!?"(){}//d; #remove some common punctuation symbols
#We are creating a hash with the word as the key.
#Each time a word is encountered, its hash is incremented by 1.
#If the count for a word is 1, it is a new distinct word.
#We keep track of the number of words parsed so far.
#We also keep track of the no. of words of a particular length.
foreach $wd (split)
{
$count{$wd}++;
if ($count{$wd} == 1)
{
$dcount++;
}
$wcount++;
$lcount{length($wd)}++;
}
}
#To print the distinct words and their frequency,
#we iterate over the hash containing the words and their count.
print "\nThe words and their frequency in the text is:\n";
foreach $w (sort keys%count)
{
print "$w : $count{$w}\n";
}
#For the word length and frequency we use the word length hash
print "The word length and frequency in the given text is:\n";
foreach $w (sort keys%lcount)
{
print "$w : $lcount{$w}\n";
}
print "There are $wcount words in the file.\n";
print "There are $dcount distinct words in the file.\n";
$ttratio = ($dcount/$wcount)*100; #Calculating the type-token ratio.
print "The type-token ratio of the file is $ttratio.\n";
I have included the comment to mention what it does. Actually I have to find the word count from the given text file. The output of the above program will look like :
The words and their frequency in the text is:
1949 : 1
a : 1
adopt : 1
all : 2
among : 1
and : 8
assembly : 1
assuring : 1
belief : 1
citizens : 1
constituent : 1
constitute : 1
.
.
.
The word length and frequency in the given text is:
1 : 1
10 : 5
11 : 2
12 : 2
2 : 15
3 : 18
There are 85 words in the file.
There are 61 distinct words in the file.
The type-token ratio of the file is 71.7647058823529.
Even though with the help of Google I can able to find out the solution for my homework. But however I think that there will be a smaller and concise code using the real power of Perl. Can anyone give me a solution in Perl with much less lines of code?
One of the major application of Perl language is to processing of text files and analysis of the strings. Perl also used for CGI( Common Gateway Interface) scripts. Used in web development, GUI(Graphical User Interface) development. Perl's text-handling capabilities is also used for generating SQL queries.
Why it is still relevant in 2022. Perl is not going away even if it tends to be less trendy than other modern languages. It is used in production codebases of many companies, for tasks as diverse as web development, databases access, log analysis or web crawling. It is a core component of most unix-like systems.
Perl Will Fade Away Yet RedMonk and the TIOBE Index both show Perl in decline—and while you may take issue with how either site ranks programming languages, if their varying methodologies arrive at the same conclusion, then it's safe to say that something is actually going on here.
#1 Perl is best suited for Text Manipulation In fact, Perl has been the goto language for regex, HTML parsing, JSON manipulation, etc for almost three decades. Quite simply, no other programming language provides more powerful or easy-to-use ways of manipulating text.
Here are several suggestions:
Include use strict
and use warnings
in your Perl scripts.
Your argument validation isn't testing what it should be testing: (1) whether there is exactly 1 item in @ARGV
, and (2) whether that item is a valid file name.
Although there are exceptions to every rule, it's generally good practice to assign the return from <>
to a named variable, rather than relying on $_
. This is particularly true if the code inside the loop might need to use one of Perl's constructs that also relies on $_
(for example, map
, grep
, or post-fix for
loops)
while (my $line = <>){
...
}
Perl provides a built-in function (lc
) to lowercase strings.
You are performing unnecessary computations inside the line reading loop. If you simply build up a tally of words, you'll have all of the information you need. Also note that Perl offers a one-liner form for most of its control structures (for
, while
, if
, etc.), as illustrated below.
while (my $line = <>){
...
$words{$_} ++ for split /\s+/, $line;
}
You can then use the word tallies to compute the other information you need. For example, the number of unique words is simply the number of keys in the hash and the total number of words is the sum of the hash values.
The distribution of word lengths can be computed like this:
my %lengths;
$lengths{length $_} += $words{$_} for keys %words;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With