Dropping multiple partitions in Impala/Hive

Q: How do I drop a partition on an Impala?

To drop or alter multiple partitions: For example, you might drop a group of partitions corresponding to a particular date range after the data "ages out": alter table historical_data drop partition (year < 1995); alter table historical_data drop partition (year = 1996 and month between 1 and 6);

Q: Can we use multiple partitions in Hive?

Partitioning is a way of dividing a table into related parts based on the values of particular columns like date, city, and department. Each table in the hive can have one or more partition keys to identify a particular partition. Using partition it is easy to do queries on slices of the data.

Tags:

sql

hive

hdfs

partitioning

impala

1- I'm trying to delete multiple partitions at once, but struggling to do it with either Impala or Hive. I tried the following query, with and without ':

ALTER TABLE cz_prd_corrti_st.s1mme_transstats_info DROP IF EXISTS PARTITION (pr_load_time='20170701000317') PARTITION (pr_load_time='20170701000831')

The error I'm getting is as follow:

AnalysisException: Syntax error in line 3: PARTITION (pr_load_time='20170701000831') ^ Encountered: PARTITION Expected: CACHED, LOCATION, PURGE, SET, UNCACHED CAUSED BY: Exception: Syntax error

The partition column is bigint type, query for deleting only one partition works as expected:

ALTER TABLE cz_prd_corrti_st.s1mme_transstats_info DROP IF EXISTS
PARTITION   (pr_load_time='20170701000317')

2- Is it a good practice delete the hdfs data first and then drop the partitions in Impala/Hive, or is it supposed to be done vice versa?

893

asked Aug 07 '17 09:08

k_mishap

1 Answers

1.

Your syntax is wrong.
In the DROP command the partitions should be separated by commas.

Demo

hive> create table t (i int) partitioned by (p int);
OK

hive> alter table t add partition (p=1) partition(p=2) partition(p=3) partition(p=4) partition(p=5);
OK

hive> show partitions t;
OK
partition
p=1
p=2
p=3
p=4
p=5

hive> alter table t drop if exists partition (p=1),partition (p=2),partition(p=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK

hive> show partitions t;
OK
partition
p=4
p=5

2.

You can drop a range.

Demo

hive> create table t (i int) partitioned by (p int);
OK

hive> alter table t add partition (p=1) partition(p=2) partition(p=3) partition(p=4) partition(p=5);
OK

hive> show partitions t;
OK
partition
p=1
p=2
p=3
p=4
p=5

hive> alter table t drop if exists partition (p<=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK

hive> show partitions t;
OK
partition
p=4
p=5

answered Sep 23 '22 08:09

David דודו Markovitz

Related questions
                            
                                Change column value when matching condition
                            
                                Find rows with duplicate values in a column
                            
                                Delete from table A joining on table A in Redshift
                            
                                Postgres recursive query with row_to_json
                            
                                Double quotes from excel query when copy/paste from Excel into SQL Server 2012
                            
                                Convert NULL datetime to Blank
                            
                                find the table size on IBM netezza sql database from Aginity workbench on win 7
                            
                                Running sqlplus in background in Unix
                            
                                Count (Distinct ([value)) OVER (Partition by) in SQL Server 2008
                            
                                Group function is not allowed here
                            
                                Round And Show To 2 Decimal Places? [duplicate]
                            
                                SQL: Find highest occurrence of a column value in a table?
                            
                                Display all records with leading or trailing spaces
                            
                                MS SQL update column with auto incremented value
                            
                                Join multiple tables with same column name
                            
                                Adding a lowercase constraint to a column in sql
                            
                                Fixing a slow running SQL query
                            
                                PostgreSQL - Insert data into multiple tables simultaneously
                            
                                Parameters vs String interpolation
                            
                                Correctly Migrate Postgres least() Behavior to BigQuery

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With