Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop fs -du-h sorting by size for M, G, T, P, E, Z, Y

I am running this command --

sudo -u hdfs hadoop fs -du -h /user | sort -nr 

and the output is not sorted in terms of gigs, Terabytes,gb

I found this command -

hdfs dfs -du -s /foo/bar/*tobedeleted | sort -r -k 1 -g | awk '{ suffix="KMGT"; for(i=0; $1>1024 && i < length(suffix); i++) $1/=1024; print int($1) substr(suffix, i, 1), $3; }' 

but did not seem to work.

is there a way or a command line flag i can use to make it sort and output should look like--

123T  /xyz
124T  /xyd
126T  /vat
127G  /ayf
123G  /atd

Please help

regards Mayur

like image 778
Mayur Narang Avatar asked Jun 28 '16 21:06

Mayur Narang


3 Answers

hdfs dfs -du -h <PATH> | awk '{print $1$2,$3}' | sort -hr

Short explanation:

  • The hdfs command gets the input data.
  • The awk only prints the first three fields with a comma in between the 2nd and 3rd.
  • The -h of sort compares human readable numbers like 2K or 4G, while the -r reverses the sort order.
like image 60
Li Su Avatar answered Nov 06 '22 00:11

Li Su


hdfs dfs -du -h <PATH> | sed 's/ //' | sort -hr

sed will strip out the space between the number and the unit, after which sort will be able to understand it.

like image 5
Neil Avatar answered Nov 06 '22 00:11

Neil


This is a rather old question, but stumbled across it while trying to do the same thing. As you were providing the -h (human readable flag) it was converting the sizes to different units to make it easier for a human to read. By leaving that flag off we get the aggregate summary of file lengths (in bytes).

sudo -u hdfs hadoop fs -du -s '/*' | sort -nr

Not as easy to read but means you can sort it correctly.

See https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#du for more details.

like image 3
Ben Dalling Avatar answered Nov 06 '22 02:11

Ben Dalling