Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I count the characters, words, and lines in a file, using Perl?

What is a good/best way to count the number of characters, words, and lines of a text file using Perl (without using wc)?

like image 525
NoahD Avatar asked Apr 23 '09 14:04

NoahD


People also ask

How do you count words lines and characters in a file?

Use the wc command to count the number of lines, words, and bytes in the files specified by the File parameter. If a file is not specified for the File parameter, standard input is used. The command writes the results to standard output and keeps a total count for all named files.

How do I count words in Perl?

Counting frequency of all words of a string is a basic operation for any programming language. The frequency of each word of the text can be counted and stored in a hash for further use. In Perl, we can do this by firstly splitting the words of the string into an array.

How do I count characters in Perl?

Perl | length() Function length() function in Perl finds length (number of characters) of a given string, or $_ if not specified. Return: Returns the size of the string.

How do I count the number of lines in a text file in Perl?

This will count the total number of lines in the file. open(FH, "<", "foo. txt"); my $count = 0; while (<FH>) { $count = $.; } close FH; print "$count\n"; To count the number of lines in a file that contain the text "Hello World".


2 Answers

Here's the perl code. Counting words can be somewhat subjective, but I just say it's any string of characters that isn't whitespace.

open(FILE, "<file.txt") or die "Could not open file: $!";

my ($lines, $words, $chars) = (0,0,0);

while (<FILE>) {
    $lines++;
    $chars += length($_);
    $words += scalar(split(/\s+/, $_));
}

print("lines=$lines words=$words chars=$chars\n");
like image 90
bmdhacks Avatar answered Sep 28 '22 17:09

bmdhacks


A variation on bmdhacks' answer that will probably produce better results is to use \s+ (or even better \W+) as the delimiter. Consider the string "The  quick  brown fox" (additional spaces if it's not obvious). Using a delimiter of a single whitespace character will give a word count of six not four. So, try:

open(FILE, "<file.txt") or die "Could not open file: $!";

my ($lines, $words, $chars) = (0,0,0);

while (<FILE>) {
    $lines++;
    $chars += length($_);
    $words += scalar(split(/\W+/, $_));
}

print("lines=$lines words=$words chars=$chars\n");

Using \W+ as the delimiter will stop punctuation (amongst other things) from counting as words.

like image 29
Nic Gibson Avatar answered Sep 28 '22 16:09

Nic Gibson