Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why we need to move external table to managed hive table?

I am new to Hadoop and learning Hive.

In Hadoop definative guide 3rd edition page no. 428 last paragraph

I don't understand below paragraph regarding external table in HIVE.

"A common pattern is to use an external table to access an initial dataset stored in HDFS (created by another process), then use a Hive transform to move the data into a managed Hive table."

Can anybody explain briefly what above phrase says?

like image 636
Raj Avatar asked Aug 19 '13 11:08

Raj


1 Answers

Usually the data in the initial dataset is not constructed in the optimal way for queries.
You may want to modify the data (like modifying some columns adding columns, making aggregation etc) and to store it in a specific way (partitions / buckets / sorted etc) so that the queries would benefit from these optimizations.

like image 93
dimamah Avatar answered Sep 20 '22 18:09

dimamah