Main difference between dynamic and static partitioning in Hive

2 Answers

in static partitioning we need to specify the partition column value in each and every LOAD statement.

suppose we are having partition on column country for table t1(userid, name,occupation, country), so each time we need to provide country value

hive>LOAD DATA INPATH '/hdfs path of the file' INTO TABLE t1 PARTITION(country="US")
hive>LOAD DATA INPATH '/hdfs path of the file' INTO TABLE t1 PARTITION(country="UK")

dynamic partition allow us not to specify partition column value each time. the approach we follows is as below:

create a non-partitioned table t2 and insert data into it.
now create a table t1 partitioned on intended column(say country).

load data in t1 from t2 as below:

hive> INSERT INTO TABLE t2 PARTITION(country) SELECT * from T1;

make sure that partitioned column is always the last one in non partitioned table(as we are having country column in t2)

142

answered Sep 19 '22 19:09

Azam Khan

Partitioning in Hive is very useful to prune data during query to reduce query times.

Partitions are created when data is inserted into table. Depending on how you load data you would need partitions. Usually when loading files (big files) into Hive tables static partitions are preferred. That saves your time in loading data compared to dynamic partition. You "statically" add a partition in table and move the file into the partition of the table. Since the files are big they are usually generated in HDFS. You can get the partition column value form the filename, day of date etc without reading the whole big file.

Incase of dynamic partition whole big file i.e. every row of the data is read and data is partitioned through a MR job into the destination tables depending on certain field in file. So usually dynamic partition are useful when you are doing sort of a ETL flow in your data pipeline. e.g. you load a huge file through a move command into a Table X. then you run a inert query into a Table Y and partition data based on field in table X say day , country. You may want to further run a ETL step to partition the data in country partition in Table Y into a Table Z where data is partitioned based on cities for a particular country only. etc.

Thus depending on your end table or requirements for data and in what form data is produced at source you may choose static or dynamic partition.

answered Sep 22 '22 19:09

Urvishsinh Mahida

Related questions
                            
                                How to check if a table exists in Hive?
                            
                                Hive date function to achieve day of week
                            
                                Container killed by the ApplicationMaster Exit code is 143
                            
                                Amazon Elastic MapReduce - mass insert from S3 to DynamoDB is incredibly slow
                            
                                How does Hive choose the number of reducers for a job?
                            
                                How to find the most recent partition in HIVE table
                            
                                Spark without Hadoop: Failed to Launch
                            
                                Why is count(distinct) slower than group by in Hive?
                            
                                Hive loading in partitioned table
                            
                                org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
                            
                                While airflow initdb, ImportError: cannot import name HiveOperator
                            
                                Hive query results in vertical format like MySQL's "\G"?
                            
                                Does Spark SQL use Hive Metastore?
                            
                                In a hadoop cluster, should hive be installed on all nodes?
                            
                                Dropping multiple tables with same prefix in Hive
                            
                                LATERAL VIEW EXPLODE in presto
                            
                                How to handle fields enclosed within quotes(CSV) in importing data from S3 into DynamoDB using EMR/Hive
                            
                                Hiveql - RIGHT() LEFT() Function
                            
                                How to identify which database the user is using in hive CLI ?
                            
                                How to create hive table from Spark data frame, using its schema?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Main difference between dynamic and static partitioning in Hive

Tags:

hive

Ronak

People also ask

2 Answers

Azam Khan

Urvishsinh Mahida

Recent Activity

Donate For Us