Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

command to print out large files, sorted, with sizes in human readable format

Tags:

linux

bash

I've written a simple shell script that finds large files, mostly to save myself some typing. The work is being done with:

find $dir -type f -size +"$size"M -printf '%s %p\n' | sort -rn

I'd like to turn the byte output into a human readable format. I found ways online on how to manually do this, e.g.,

find $dir -type f -size +"$size"M -printf '%s %p\n' | sort -rn |
   awk '{ hum[1024**4]="TB"; hum[1024**3]="GB"; hum[1024**2]="MB"; hum[1024]="KB"; hum[0]="B";
      for (x=1024**4; x>=1024; x/=1024){
         if ($1>=x) { printf "%7.2f %s\t%s\n",$1/x,hum[x],$2;break }
      }}'

But this seems messy. I was wondering: is there was a standard way to convert bytes into a human-readable form?

Of course, any alternate methods of producing the below output, given a directory and min-size as input, are also welcome:

   1.25 GB      /foo/barf
 598.80 MB      /foo/bar/bazf
 500.58 MB      /bar/bazf
 421.70 MB      /bar/baz/bamf
 ...

Note: This must work on both 2.4 and 2.6, and the output should be sorted.

like image 576
Christopher Neylan Avatar asked Jan 20 '12 14:01

Christopher Neylan


2 Answers

Use du -h and sort -h

find /your/dir -type f -size +5M -exec du -h '{}' + | sort -hr

Explanations:

  • du -h file1 file2 ... prints the disk usage in human readable format of the given files.
  • sort -hr sorts human readable numbers in reverse order (larger numbers first).
  • the option + of find -exec will reduce the number of invocations of command du and therefore will speed up the execution. Here + can be replaced by ';'.

You can remove option -r of sort command if you want the larger files being printed at the end. You can even use the simpler following command, but your terminal window buffer may be filled!

find /your/dir -type f -exec du -h '{}' + | sort -h

Or if you want just the top ten larger files:

find /your/dir -type f -exec du -h '{}' + | sort -hr | head

Note: option -h of sort has been introduced in about 2009, therefore this option may not be available on old distro (as Red Hat 5). Moreover the option + of find -exec is not available either on older distro (as Red Hat 4).


On old distro, you can use xargs instead of option + of find -exec. The command ls may also be used to print sorted files. But to guarantee the sorting by size, xargs must invoke ls only once. xargs can invoke ls only once if your amount of files is acceptable: it depends on the text length passed to ls argument (sum of all filenames length).

find /your/dir -type f -size +5M -print0 | xargs -0 ls -1Ssh

(with a little inspiration borrowed from MichaelKrelin-hacker).

Explanations:

  • ls -1 displays one file per line
  • ls -S sorts by file size
  • ls -s prints the file size
  • ls -h prints sizes in human readable format

The fastest command may be using the above ls -1Ssh with the + option of find -exec but as above the amount of files must be acceptable to invoke ls only once in order to guarantee the sorting by size (option + of find -exec works in much the same way as xargs).

find /your/dir -type f -size +5M -exec ls -1Ssh '{}' +

To reduce the amount of files found, you can increase the threshold size: replace +5M by +100M for instance.

like image 171
oHo Avatar answered Oct 04 '22 10:10

oHo


find ... | sort -rn | cut -d\  -f2 | xargs df -h

for instance :) or

find $dir -type -f size +$size -print0 | xargs -0 ls -1hsS

(with a little inspiration borrowed from olibre).

like image 40
Michael Krelin - hacker Avatar answered Oct 04 '22 09:10

Michael Krelin - hacker