Create hive external table from partitioned parquet files in Azure HDInsights

Tags:

I have data saved as parquet files in Azure blob storage. Data is partitioned by year, month, day and hour like:

cont/data/year=2017/month=02/day=01/

I want to create external table in Hive using following create statement, which I wrote using this reference.

CREATE EXTERNAL TABLE table_name (uid string, title string, value string) 
PARTITIONED BY (year int, month int, day int) STORED AS PARQUET 
LOCATION 'wasb://cont@storage_name.blob.core.windows.net/data';

This creates table but has no rows when querying. I tried same create statement without PARTITIONED BY clause and that seems to work. So looks like issue is with partitioning.

What am I missing?

272

asked Apr 11 '17 12:04

chhantyal

1 Answers

After you create the partitioned table, run the following in order to add the directories as partitions

MSCK REPAIR TABLE table_name;

If you have a large number of partitions you might need to set hive.msck.repair.batch.size

When there is a large number of untracked partitions, there is a provision to run MSCK REPAIR TABLE batch wise to avoid OOME (Out of Memory Error). By giving the configured batch size for the property hive.msck.repair.batch.size it can run in the batches internally. The default value of the property is zero, it means it will execute all the partitions at once.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)

Written by the OP:

This will probably fix your issue, however if data is very large, it won't work. See relevant issue here.

As a workaround, there is another way to add partitions to Hive metastore one by one like:

alter table table_name add partition(year=2016, month=10, day=11, hour=11)

We wrote simple script to automate this alter statement and it seems to work for now.

answered Oct 31 '22 09:10

David דודו Markovitz

Related questions
                            
                                Azure Table Storage - Entity Design Best Practices Question
                            
                                Windows Azure: How to 301 non-www url to www for a domain
                            
                                Retrieve and use Windows Azure's connection strings?
                            
                                What's the difference between the webrole onStart() event and Application_Start() global.asax event?
                            
                                Hosting website on Azure Virtual Machine
                            
                                Why is CloudBlockBlob.DownloadToStream always returning an empty stream?
                            
                                Unexpected response code from CloudTable.ExecuteBatch(..)
                            
                                Is it possible to have multiple triggers in Azure Logic Apps
                            
                                Azure CDN rules engine to rewrite default document and remove .html extension
                            
                                Multiple Function Apps with fewer Functions or few Function Apps with lots of Functions?
                            
                                Pull image Azure Container Registry - Kubernetes
                            
                                Azure ARM Template Unit Test
                            
                                how to get the latest secret version value from azure key vault in one rest api call
                            
                                Azure - 2x extra small or a single small instance
                            
                                Looking for a .NET BuildServer SaaS
                            
                                Azure website cannot access Azure DB
                            
                                How do you wait on a Task Scheduler task to finish in a batch file or C#?
                            
                                Test Webhook at localhost in braintree
                            
                                How does Azure DocumentDB scale? And do I need to worry about it?
                            
                                Identity Column in DocumentDB

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Create hive external table from partitioned parquet files in Azure HDInsights

Tags:

hive

azure

parquet

azure-hdinsight

chhantyal

People also ask

1 Answers

David דודו Markovitz

Recent Activity

Donate For Us