Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"grep -c" versus "wc -l"

Tags:

grep

bash

wc

I am processing a number of large text files, ie. converting them all from one format to another. There are some small differences in the original formats of the files, but - with a bit of pre-processing in a few cases - they are mostly being successfully converted with a bash shellscript I have created.

So far so good, but one thing is puzzling me. At one point the script sets a variable called $iterations, so that it knows how many times to perform a particular for-loop. This value is determined by the number of empty lines in a temporary file that is created by the script.

Thus, the original version of my script contained the line:

    iterations=$(cat tempfile | grep '^$' | wc -l)

This has worked fine so far with all but one of the text files, which didn't seem to set the $iterations variable correctly, giving a value of '1' even though there appeared to be more than 20,000 empty lines in tempfile.

However, having discovered grep -c, I changed the line to:

    iterations=$(cat tempfile | grep -c '^$')

and the script suddenly worked, ie. $iterations was set correctly.

Can anyone explain why the two versions produce different results? And why the first version would work on some files and not others? Is there some upper limit value above which wc -l defaults to 1? The file which wouldn't work with the first version is one of the largest, but not the largest in the set (which converted correctly the first time).

like image 448
John W Avatar asked Apr 18 '17 16:04

John W


1 Answers

If the input is not a text file, then grep will print the single line Binary file (standard input) matches, and wc -l will count that line! But grep -c will happily count the number of matches in the file.

like image 51
William Pursell Avatar answered Oct 12 '22 01:10

William Pursell