Is there a Hive query to quickly find table size (i.e. number of rows) without launching a time-consuming MapReduce job? (Which is why I want to avoid COUNT(*)
.)
I tried DESCRIBE EXTENDED
, but that yielded numRows=0
which is obviously not correct.
(Apologies for the newb question. I tried Googling and searching the apache.org documentation without success.)
In order to count the number of rows in a table: SELECT COUNT(*) FROM table2; Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).
For a non-partitioned table, you can issue the command: ANALYZE TABLE Table1 COMPUTE STATISTICS FOR COLUMNS; to gather column statistics of the table (Hive 0.10. 0 and later).
"transient_lastDdlTime" is the property which tells about the last altered time of Hive tables.
tblproperties will give the size of the table and can be used to grab just that value if needed.
-- gives all properties show tblproperties yourTableName -- show just the raw data size show tblproperties yourTableName("rawDataSize")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With