Get row count from all tables in hive

Question

How can I get row count from all tables using hive? I am interested in the database name, table name and row count

Mukul Gupta · Accepted Answer

You will need to do a

select count(*) from table

for all tables.

To automate this, you can make a small bash script and some bash commands. First run

$hive -e 'show tables' | tee tables.txt

This stores all tables in the database in a text file tables.txt

Create a bash file (count_tables.sh) with the following contents.

while read line
do
 echo "$line "
 eval "hive -e 'select count(*) from $line'"
done

Now run the following commands.

$chmod +x count_tables.sh
$./count_tables.sh < tables.txt > counts.txt

This creates a text file(counts.txt) with the counts of all the tables in the database

Pratik Khadloya · Answer

A much faster way to get approximate count of all rows in a table is to run explain on the table. In one of the explain clauses, it shows row counts like below:

TableScan [TS_0] (rows=224910 width=78)

The benefit is that you are not actually spending cluster resources to get that information.

The HQL command is explain select * from table_name; but when not optimized not shows rows in the TableScan.

Get row count from all tables in hive

Tags:

hql

hive

Raunak Jhawar

2 Answers

Mukul Gupta

Pratik Khadloya

Recent Activity

Donate For Us

Get row count from all tables in hive

Tags:

hql

hive

Raunak Jhawar

2 Answers

Mukul Gupta

Pratik Khadloya

Related questions

Recent Activity

Donate For Us