Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Human readable, recursive, sorted list of largest files

What is the best practice for printing a top 10 list of largest files in a POSIX shell? There has to be something more elegant than my current solution:

DIR="."
N=10
LIMIT=512000

find $DIR -type f -size +"${LIMIT}k" -exec du {} \; | sort -nr | head -$N | perl -p -e 's/^\d+\s+//' | xargs -I {} du -h {}

where LIMIT is a file size threshold to limit the results of find.

like image 812
Matti Avatar asked Mar 06 '11 20:03

Matti


People also ask

How do I find the top 5 largest files in Linux?

Listing Files In Size Order Using the ls Command in Linux To list the directory contents in descending file size order, use the ls command along with the -IS argument. You will see the larger files at the top of the list descending to the smallest files at the bottom.

How do I find the top 10 largest files in Linux?

du command -h option : Display sizes in human readable format (e.g., 1K, 234M, 2G). du command -s option : It shows only a total for each argument (summary). du command -x option : Skip directories on different file systems. sort command -r option : Reverse the result of comparisons.

How can I see all files sorted by size?

To list all files and sort them by size, use the -S option. By default, it displays output in descending order (biggest to smallest in size). You can output the file sizes in human-readable format by adding the -h option as shown. And to sort in reverse order, add the -r flag as follows.


1 Answers

Edit:

Using Gnu utilities (du and sort):

du -0h | sort -zrh | tr '\0' '\n'

This uses a null delimiter to pass information between du and sort and uses tr to convert the nulls to newlines. The nulls allow this pipeline to process filenames which may include newlines. Both -h options cause the output to be in human-readable form.

Original:

This uses awk to create extra columns for sort keys. It only calls du once. The output should look exactly like du.

I've split it into multiple lines, but it can be recombined into a one-liner.

du -h |
  awk '{printf "%s %08.2f\t%s\n", 
    index("KMG", substr($1, length($1))),
    substr($1, 0, length($1)-1), $0}' |
  sort -r | cut -f2,3

Explanation:

  • BEGIN - create a string to index to substitute 1, 2, 3 for K, M, G for grouping by units, if there's no unit (the size is less than 1K), then there's no match and a zero is returned (perfect!)
  • print the new fields - unit, value (to make the alpha-sort work properly it's zero-padded, fixed-length) and original line
  • index the last character of the size field
  • pull out the numeric portion of the size
  • sort the results, discard the extra columns

Try it without the cut command to see what it's doing.

Edit:

Here's a version which does the sorting within the AWK script and doesn't need cut (requires GNU AWK (gawk) for asorti support):

du -h0 |
   gawk 'BEGIN {RS = "\0"}
        {idx = sprintf("%s %08.2f %s", 
         index("KMG", substr($1, length($1))),
         substr($1, 0, length($1)-1), $0);
         lines[idx] = $0}
    END {c = asorti(lines, sorted);
         for (i = c; i >= 1; i--)
           print lines[sorted[i]]}'

Edit: Added null record separation in order to handle potential filenames which include newlines. Requires GNU du and gawk.

like image 104
Dennis Williamson Avatar answered Nov 05 '22 20:11

Dennis Williamson