Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find file in each directory with the highest number as filename?

Tags:

linux

bash

I have a file structure that looks like this

./501.res/1.bin
./503.res/1.bin
./503.res/2.bin
./504.res/1.bin

and I would like to find the file path to the .bin file in each directory which have the highest number as filename. So the output I am looking for would be

./501.res/1.bin
./503.res/2.bin
./504.res/1.bin

The highest number a file can have is 9.

Question

How do I do that in BASH?

I have come as far as find .|grep bin|sort

like image 234
Sandra Schlichting Avatar asked Jun 22 '12 14:06

Sandra Schlichting


2 Answers

Globs are guaranteed to be expanded in lexical order.

for dir in ./*/
do
    files=($dir/*)           # create an array
    echo "${files[@]: -1}"   # access its last member
done
like image 104
Dennis Williamson Avatar answered Nov 14 '22 23:11

Dennis Williamson


What about using awk? You can get the FIRST occurrence really simply:

[ghoti@pc ~]$ cat data1
./501.res/1.bin
./503.res/1.bin
./503.res/2.bin
./504.res/1.bin
[ghoti@pc ~]$ awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' data1
./501.res/1.bin
./503.res/1.bin
./504.res/1.bin
[ghoti@pc ~]$ 

To get the last occurrence you could pipe through a couple of sorts:

[ghoti@pc ~]$ sort -r data1 | awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' | sort
./501.res/1.bin
./503.res/2.bin
./504.res/1.bin
[ghoti@pc ~]$ 

Given that you're using "find" and "grep", you could probably do this:

find . -name \*.bin -type f -print | sort -r | awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' | sort

How does this work?

The find command has many useful options, including the ability to select your files by glob, select the type of file, etc. Its output you already know, and that becomes the input to sort -r.

First, we sort our input data in reverse (sort -r). This insures that within any directory, the highest numbered file will show up first. That result gets fed into awk. FS is the field separator, which makes $2 into things like "/501", "/502", etc. Awk scripts have sections in the form of condition {action} which get evaluated for each line of input. If a condition is missing, the action runs on every line. If "1" is the condition and there is no action, it prints the line. So this script is broken out as follows:

  • a[$2] {next} - If the array a with the subscript $2 (i.e. "/501") exists, just jump to the next line. Otherwise...
  • {a[$2]=1} - set the array a subscript $2 to 1, so that in future the first condition will evaluate as true, then...
  • 1 - print the line.

The output of this awk script will be the data you want, but in reverse order. The final sort puts things back in the order you'd expect.

Now ... that's a lot of pipes, and sort can be a bit resource hungry when you ask it to deal with millions of lines of input at the same time. This solution will be perfectly sufficient for small numbers of files, but if you're dealing with large quantities of input, let us know, and I can come up with an all-in-one awk solution (that will take longer than 60 seconds to write).

UPDATE

Per Dennis' sage advice, the awk script I included above could be improved by changing it from

BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1

to

BEGIN{FS="."} $2 in a {next} {a[$2]} 1

While this is functionally identical, the advantage is that you simply define array members rather than assigning values to them, which may save memory or cpu depending on your implementation of awk. At any rate, it's cleaner.

like image 31
ghoti Avatar answered Nov 14 '22 21:11

ghoti