Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently list files that have exactly `n` lines?

In order to list files that have exactly n lines, one can do

n=5
find . -name "*.txt" | xargs wc -l | awk -v n=${n} -F" " '{if ($1==n) {print $2} }'

but this solution is quite slow as it counts the number of lines for each file first and then only select those that have n lines. A process that would count the lines and stops when it reaches n+1 lines would be much more efficient (esp. when dealing with big files that have plenty of lines).

How to efficiently list files that have exactly n lines?

Note, for the special case, where each line is of exactly the same size, then one could probably do

n=5
sizePerLine=500
find . -name '*.txt' -size $(( ${n} * ${sizePerLine} ))
like image 535
Remi.b Avatar asked Dec 30 '25 21:12

Remi.b


1 Answers

I think the following would be faster:

find . -name "*.txt" -exec awk -v n="$n" 'FILENAME != prevfile {if(prevfnr==n) print prevfile} {prevfile = FILENAME; prevfnr = FNR; if(FNR>n) {nextfile;}} END{if (FNR==n) {print FILENAME} }' {} +

How it works:

  • use -exec ... {} + to use find to execute the command for each file, and let it pass many args per invokation
  • awk -v n="$n" invoke awk and define an awk variable called n to have the same value as the shell variable n
  • FILENAME != prevfile {if(prevfnr==n) print prevfile checks if the current file is the same as the last record was in, and if not see if the previous file had exactly n records, if so print the name of that file
  • {prevfile = FILENAME; prevfnr = FNR; if(FNR>n) {nextfile;}} update the prevfile variable with the current FILENAME and the prevfnr variable with the current FNR. Also, if our current file record is over n, jump to the next file without processing anything more here
  • END{if (FNR==n) {print FILENAME} at the end see if the last file also had exactly n records

Interestingly, I found that this actually gives different results than the version that uses wc -l, though I think this one is probably actually more correct. For files in my directory whose last line does not include a line ending character wc -l would report the number of lines, not counting the last "unterminated" line, but the solution here will count it.

Arg, I had failed to appreciate that nextfile is a GNU-ism. If I'm already limiting myself to that we can make this much cleaner as

find . -name '*.txt' -exec  awk -v n="$n" 'FNR > n {nextfile;} ENDFILE{if (FNR==n) {print FILENAME} }' {} +

it doesn't seem to me that POSIX awk has a good shortcut to jump to the next file, which is the key that this solution needs for it's efficiency

like image 173
Eric Renouf Avatar answered Jan 02 '26 12:01

Eric Renouf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!