Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How determine Hive database size?

Tags:

bash

hive

hiveql

How determine Hive's database size from Bash or from Hive CLI?

hdfs and hadoop commands are also avaliable in Bash.

like image 747
Aleks Ya Avatar asked Feb 05 '26 18:02

Aleks Ya


2 Answers

A database in hive is a metadata storage - meaning it holds information about tables and has a default location. Tables in a database can also be stored anywhere in hdfs if location is specified when creating a table.

You can see all tables in a database using show tables command in Hive CLI.

Then, for each table, you can find its location in hdfs using describe formatted <table name> (again in Hive CLI).

Last, for each table you can find its size using hdfs dfs -du -s -h /table/location/

I don't think there's a single command to measure the sum of sizes of all tables of a database. However, it should be fairly easy to write a script that automates the above steps. Hive can also be invoked from bash CLI using: hive -e '<hive command>'

like image 154
Alex Libov Avatar answered Feb 09 '26 10:02

Alex Libov


Show Hive databases on HDFS

sudo hadoop fs -ls /apps/hive/warehouse

Show Hive database size

sudo hadoop fs -du -s -h  /apps/hive/warehouse/{db_name}
like image 41
Aleks Ya Avatar answered Feb 09 '26 10:02

Aleks Ya