In order to list files that have exactly n lines, one can do
n=5
find . -name "*.txt" | xargs wc -l | awk -v n=${n} -F" " '{if ($1==n) {print $2} }'
but this solution is quite slow as it counts the number of lines for each file first and then only select those that have n lines. A process that would count the lines and stops when it reaches n+1 lines would be much more efficient (esp. when dealing with big files that have plenty of lines).
How to efficiently list files that have exactly n lines?
Note, for the special case, where each line is of exactly the same size, then one could probably do
n=5
sizePerLine=500
find . -name '*.txt' -size $(( ${n} * ${sizePerLine} ))
I think the following would be faster:
find . -name "*.txt" -exec awk -v n="$n" 'FILENAME != prevfile {if(prevfnr==n) print prevfile} {prevfile = FILENAME; prevfnr = FNR; if(FNR>n) {nextfile;}} END{if (FNR==n) {print FILENAME} }' {} +
How it works:
-exec ... {} + to use find to execute the command for each file, and let it pass many args per invokationawk -v n="$n" invoke awk and define an awk variable called n to have the same value as the shell variable nFILENAME != prevfile {if(prevfnr==n) print prevfile checks if the current file is the same as the last record was in, and if not see if the previous file had exactly n records, if so print the name of that file{prevfile = FILENAME; prevfnr = FNR; if(FNR>n) {nextfile;}} update the prevfile variable with the current FILENAME and the prevfnr variable with the current FNR. Also, if our current file record is over n, jump to the next file without processing anything more hereEND{if (FNR==n) {print FILENAME} at the end see if the last file also had exactly n recordsInterestingly, I found that this actually gives different results than the version that uses wc -l, though I think this one is probably actually more correct. For files in my directory whose last line does not include a line ending character wc -l would report the number of lines, not counting the last "unterminated" line, but the solution here will count it.
Arg, I had failed to appreciate that nextfile is a GNU-ism. If I'm already limiting myself to that we can make this much cleaner as
find . -name '*.txt' -exec awk -v n="$n" 'FNR > n {nextfile;} ENDFILE{if (FNR==n) {print FILENAME} }' {} +
it doesn't seem to me that POSIX awk has a good shortcut to jump to the next file, which is the key that this solution needs for it's efficiency
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With