Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get row count from all tables in hive

Tags:

hql

hive

How can I get row count from all tables using hive? I am interested in the database name, table name and row count

like image 451
Raunak Jhawar Avatar asked Feb 20 '14 07:02

Raunak Jhawar


2 Answers

You will need to do a

select count(*) from table

for all tables.

To automate this, you can make a small bash script and some bash commands. First run

$hive -e 'show tables' | tee tables.txt

This stores all tables in the database in a text file tables.txt

Create a bash file (count_tables.sh) with the following contents.

while read line
do
 echo "$line "
 eval "hive -e 'select count(*) from $line'"
done

Now run the following commands.

$chmod +x count_tables.sh
$./count_tables.sh < tables.txt > counts.txt

This creates a text file(counts.txt) with the counts of all the tables in the database

like image 179
Mukul Gupta Avatar answered Oct 04 '22 20:10

Mukul Gupta


A much faster way to get approximate count of all rows in a table is to run explain on the table. In one of the explain clauses, it shows row counts like below:

TableScan [TS_0] (rows=224910 width=78)

The benefit is that you are not actually spending cluster resources to get that information.


The HQL command is explain select * from table_name; but when not optimized not shows rows in the TableScan.

like image 24
Pratik Khadloya Avatar answered Oct 04 '22 21:10

Pratik Khadloya