How determine Hive's database size from Bash or from Hive CLI?
hdfs and hadoop commands are also avaliable in Bash.
A database in hive is a metadata storage - meaning it holds information about tables and has a default location. Tables in a database can also be stored anywhere in hdfs if location is specified when creating a table.
You can see all tables in a database using show tables command in Hive CLI.
Then, for each table, you can find its location in hdfs using describe formatted <table name> (again in Hive CLI).
Last, for each table you can find its size using hdfs dfs -du -s -h /table/location/
I don't think there's a single command to measure the sum of sizes of all tables of a database. However, it should be fairly easy to write a script that automates the above steps. Hive can also be invoked from bash CLI using: hive -e '<hive command>'
Show Hive databases on HDFS
sudo hadoop fs -ls /apps/hive/warehouse
Show Hive database size
sudo hadoop fs -du -s -h /apps/hive/warehouse/{db_name}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With