What is the best practice for printing a top 10 list of largest files in a POSIX shell? There has to be something more elegant than my current solution: <pre class="prettyprint"><code>DIR="." N=10 LIMIT=512000 find $DIR -type f -size +"${LIMIT}k" -exec du {} \; | sort -nr | head -$N | perl -p -e 's/^\d+\s+//' | xargs -I {} du -h {} </code></pre> where LIMIT is a file size threshold to limit the results of find.

Edit: Using Gnu utilities (<code>du</code> and <code>sort</code>): <pre class="prettyprint"><code>du -0h | sort -zrh | tr '\0' '\n' </code></pre> This uses a null delimiter to pass information between <code>du</code> and <code>sort</code> and uses <code>tr</code> to convert the nulls to newlines. The nulls allow this pipeline to process filenames which may include newlines. Both <code>-h</code> options cause the output to be in human-readable form. Original: This uses <code>awk</code> to create extra columns for sort keys. It only calls <code>du</code> once. The output should look exactly like <code>du</code>. I've split it into multiple lines, but it can be recombined into a one-liner. <pre class="prettyprint"><code>du -h | awk '{printf "%s %08.2f\t%s\n", index("KMG", substr($1, length($1))), substr($1, 0, length($1)-1), $0}' | sort -r | cut -f2,3 </code></pre> Explanation: <ul> <li>BEGIN - create a string to index to substitute 1, 2, 3 for K, M, G for grouping by units, if there's no unit (the size is less than 1K), then there's no match and a zero is returned (perfect!)</li> <li>print the new fields - unit, value (to make the alpha-sort work properly it's zero-padded, fixed-length) and original line</li> <li>index the last character of the size field</li> <li>pull out the numeric portion of the size</li> <li>sort the results, discard the extra columns</li> </ul> Try it without the <code>cut</code> command to see what it's doing. Edit: Here's a version which does the sorting within the AWK script and doesn't need cut (requires GNU AWK (<code>gawk</code>) for <code>asorti</code> support): <pre class="prettyprint"><code>du -h0 | gawk 'BEGIN {RS = "\0"} {idx = sprintf("%s %08.2f %s", index("KMG", substr($1, length($1))), substr($1, 0, length($1)-1), $0); lines[idx] = $0} END {c = asorti(lines, sorted); for (i = c; i >= 1; i--) print lines[sorted[i]]}' </code></pre> Edit: Added null record separation in order to handle potential filenames which include newlines. Requires GNU <code>du</code> and <code>gawk</code>.

Human readable, recursive, sorted list of largest files

Tags:

linux

shell

unix

posix

What is the best practice for printing a top 10 list of largest files in a POSIX shell? There has to be something more elegant than my current solution:

DIR="."
N=10
LIMIT=512000

find $DIR -type f -size +"${LIMIT}k" -exec du {} \; | sort -nr | head -$N | perl -p -e 's/^\d+\s+//' | xargs -I {} du -h {}

where LIMIT is a file size threshold to limit the results of find.

812

asked Mar 06 '11 20:03

Matti

1 Answers

Edit:

Using Gnu utilities (du and sort):

du -0h | sort -zrh | tr '\0' '\n'

This uses a null delimiter to pass information between du and sort and uses tr to convert the nulls to newlines. The nulls allow this pipeline to process filenames which may include newlines. Both -h options cause the output to be in human-readable form.

Original:

This uses awk to create extra columns for sort keys. It only calls du once. The output should look exactly like du.

I've split it into multiple lines, but it can be recombined into a one-liner.

du -h |
  awk '{printf "%s %08.2f\t%s\n", 
    index("KMG", substr($1, length($1))),
    substr($1, 0, length($1)-1), $0}' |
  sort -r | cut -f2,3

Explanation:

BEGIN - create a string to index to substitute 1, 2, 3 for K, M, G for grouping by units, if there's no unit (the size is less than 1K), then there's no match and a zero is returned (perfect!)
print the new fields - unit, value (to make the alpha-sort work properly it's zero-padded, fixed-length) and original line
index the last character of the size field
pull out the numeric portion of the size
sort the results, discard the extra columns

Try it without the cut command to see what it's doing.

Edit:

Here's a version which does the sorting within the AWK script and doesn't need cut (requires GNU AWK (gawk) for asorti support):

du -h0 |
   gawk 'BEGIN {RS = "\0"}
        {idx = sprintf("%s %08.2f %s", 
         index("KMG", substr($1, length($1))),
         substr($1, 0, length($1)-1), $0);
         lines[idx] = $0}
    END {c = asorti(lines, sorted);
         for (i = c; i >= 1; i--)
           print lines[sorted[i]]}'

Edit: Added null record separation in order to handle potential filenames which include newlines. Requires GNU du and gawk.

104

answered Nov 05 '22 20:11

Dennis Williamson

Related questions
                            
                                How to use file protocol to access a directory on local system?
                            
                                Is it possible to see output to stdout after disown and logout?
                            
                                Is it possible to compile Windows binaries on a Linux machine?
                            
                                linux clipboard read/write in C
                            
                                Tracking down MySQL connection leaks
                            
                                why gcc 4.x default reserve 8 bytes for stack on linux when calling a method?
                            
                                Attach/Detach to a remote instance of Eclipse
                            
                                Insert video clip in a lyx presentation and play it in GNU/Linux
                            
                                Running commands though PHP/Perl scripts as a priviledged user on Linux
                            
                                How to get use count from Linux kernel module?
                            
                                UNIX domain sockets not accessable across users?
                            
                                compile python script in linux
                            
                                Eclipse: editing and running code live
                            
                                How to install C++ API documentation in Ubuntu 10.04?
                            
                                how can i see the stack trace after the process is killed?
                            
                                posix_fadvise(WILLNEED) makes IO slower?
                            
                                Is there a cross-platform C signal library available(better open-sourced)?
                            
                                Select text using keyboard in linux shell
                            
                                Cygwin GCC Cross compiling binaries?
                            
                                Find file modes in console (fuzzy completion)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With