How can I get row count from all tables using hive? I am interested in the database name, table name and row count
You will need to do a
select count(*) from table
for all tables.
To automate this, you can make a small bash script and some bash commands. First run
$hive -e 'show tables' | tee tables.txt
This stores all tables in the database in a text file tables.txt
Create a bash file (count_tables.sh) with the following contents.
while read line
do
echo "$line "
eval "hive -e 'select count(*) from $line'"
done
Now run the following commands.
$chmod +x count_tables.sh
$./count_tables.sh < tables.txt > counts.txt
This creates a text file(counts.txt) with the counts of all the tables in the database
A much faster way to get approximate count of all rows in a table is to run explain on the table. In one of the explain clauses, it shows row counts like below:
TableScan [TS_0] (rows=224910 width=78)
The benefit is that you are not actually spending cluster resources to get that information.
The HQL command is explain select * from table_name;
but when not optimized not shows rows in the TableScan.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With