I have a file structure that looks like this
./501.res/1.bin
./503.res/1.bin
./503.res/2.bin
./504.res/1.bin
and I would like to find the file path to the .bin
file in each directory which have the highest number as filename. So the output I am looking for would be
./501.res/1.bin
./503.res/2.bin
./504.res/1.bin
The highest number a file can have is 9.
Question
How do I do that in BASH?
I have come as far as find .|grep bin|sort
Globs are guaranteed to be expanded in lexical order.
for dir in ./*/
do
files=($dir/*) # create an array
echo "${files[@]: -1}" # access its last member
done
What about using awk
? You can get the FIRST occurrence really simply:
[ghoti@pc ~]$ cat data1
./501.res/1.bin
./503.res/1.bin
./503.res/2.bin
./504.res/1.bin
[ghoti@pc ~]$ awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' data1
./501.res/1.bin
./503.res/1.bin
./504.res/1.bin
[ghoti@pc ~]$
To get the last occurrence you could pipe through a couple of sorts:
[ghoti@pc ~]$ sort -r data1 | awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' | sort
./501.res/1.bin
./503.res/2.bin
./504.res/1.bin
[ghoti@pc ~]$
Given that you're using "find" and "grep", you could probably do this:
find . -name \*.bin -type f -print | sort -r | awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' | sort
How does this work?
The find
command has many useful options, including the ability to select your files by glob, select the type of file, etc. Its output you already know, and that becomes the input to sort -r
.
First, we sort our input data in reverse (sort -r
). This insures that within any directory, the highest numbered file will show up first. That result gets fed into awk. FS is the field separator, which makes $2
into things like "/501", "/502", etc. Awk scripts have sections in the form of condition {action}
which get evaluated for each line of input. If a condition is missing, the action runs on every line. If "1" is the condition and there is no action, it prints the line. So this script is broken out as follows:
a[$2] {next}
- If the array a
with the subscript $2 (i.e. "/501") exists, just jump to the next line. Otherwise...{a[$2]=1}
- set the array a subscript $2 to 1, so that in future the first condition will evaluate as true, then...1
- print the line.The output of this awk script will be the data you want, but in reverse order. The final sort
puts things back in the order you'd expect.
Now ... that's a lot of pipes, and sort can be a bit resource hungry when you ask it to deal with millions of lines of input at the same time. This solution will be perfectly sufficient for small numbers of files, but if you're dealing with large quantities of input, let us know, and I can come up with an all-in-one awk solution (that will take longer than 60 seconds to write).
UPDATE
Per Dennis' sage advice, the awk script I included above could be improved by changing it from
BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1
to
BEGIN{FS="."} $2 in a {next} {a[$2]} 1
While this is functionally identical, the advantage is that you simply define array members rather than assigning values to them, which may save memory or cpu depending on your implementation of awk. At any rate, it's cleaner.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With