Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive query to quickly find table size (number of rows)

Tags:

hadoop

hive

Is there a Hive query to quickly find table size (i.e. number of rows) without launching a time-consuming MapReduce job? (Which is why I want to avoid COUNT(*).)

I tried DESCRIBE EXTENDED, but that yielded numRows=0 which is obviously not correct.

(Apologies for the newb question. I tried Googling and searching the apache.org documentation without success.)

like image 439
xenocyon Avatar asked Jan 18 '14 19:01

xenocyon


People also ask

How do I count the number of rows in Hive table?

In order to count the number of rows in a table: SELECT COUNT(*) FROM table2; Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).

How do you check table stats in Hive?

For a non-partitioned table, you can issue the command: ANALYZE TABLE Table1 COMPUTE STATISTICS FOR COLUMNS; to gather column statistics of the table (Hive 0.10. 0 and later).

What is transient_lastDdlTime in Hive?

"transient_lastDdlTime" is the property which tells about the last altered time of Hive tables.


1 Answers

tblproperties will give the size of the table and can be used to grab just that value if needed.

-- gives all properties show tblproperties yourTableName  -- show just the raw data size show tblproperties yourTableName("rawDataSize") 
like image 108
Jared Avatar answered Sep 17 '22 17:09

Jared