Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any way to compute statistics on a hive table for all partitions with a single analyze command?

The syntax I see for computing statistics in hive seems to indicate the answer to the title question would be 'no':

ANALYZE TABLE [TABLENAME] PARTITION(parcol1=…, partcol2=….) COMPUTE STATISTICS

However, I wanted to throw it out here, since it i surprising that it were always required to write a script to iterate over the partitions to generate the per-partition statements. We have about a thousand partitions on this small table right now and it will be growing by orders of magnitude.

BTW I tried the following without specifying the partition:

hive> analyze table metrics compute statistics;
FAILED: SemanticException [Error 10115]: Table is partitioned and partition specification is needed
like image 206
WestCoastProjects Avatar asked Aug 29 '13 16:08

WestCoastProjects


1 Answers

Yes, you can.

At least from hive v0.13 which I'm on. Just try partition spec syntax without specific values (no =… bits)

If you're using FOR COLUMNS then you can't due to the bug: https://issues.apache.org/jira/browse/HIVE-4861

like image 51
msciwoj Avatar answered Sep 28 '22 12:09

msciwoj