I am processing a number of large text files, ie. converting them all from one format to another. There are some small differences in the original formats of the files, but - with a bit of pre-processing in a few cases - they are mostly being successfully converted with a bash shellscript I have created.
So far so good, but one thing is puzzling me. At one point the script sets a variable called $iterations
, so that it knows how many times to perform a particular for-loop. This value is determined by the number of empty lines in a temporary file that is created by the script.
Thus, the original version of my script contained the line:
iterations=$(cat tempfile | grep '^$' | wc -l)
This has worked fine so far with all but one of the text files, which didn't seem to set the $iterations
variable correctly, giving a value of '1' even though there appeared to be more than 20,000 empty lines in tempfile
.
However, having discovered grep -c
, I changed the line to:
iterations=$(cat tempfile | grep -c '^$')
and the script suddenly worked, ie. $iterations
was set correctly.
Can anyone explain why the two versions produce different results? And why the first version would work on some files and not others? Is there some upper limit value above which wc -l
defaults to 1? The file which wouldn't work with the first version is one of the largest, but not the largest in the set (which converted correctly the first time).
If the input is not a text file, then grep
will print the single line Binary file (standard input) matches
, and wc -l
will count that line! But grep -c
will happily count the number of matches in the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With