Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I drop all partitions at once in hive?

Hive version 1.1

I have a hive external table as below:

 CREATE EXTERNAL TABLE `schedule_events`(
  `schedule_id` string COMMENT 'from deserializer',
  `service_key` string COMMENT 'from deserializer',
  `event_start_date_time` string COMMENT 'from deserializer',
  `event_id` string COMMENT 'from deserializer',
  `event_type` string COMMENT 'from deserializer',
  `transitional_key` string COMMENT 'from deserializer',
  `created_date_time` string COMMENT 'from deserializer',
  `bus_date` string COMMENT 'from deserializer')
    PARTITIONED BY (
                    `year` string,
                    `month` string,
                    `day` string)
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   STORED AS INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION
   'hdfs://nameservice1/hadoop/raw/omega/scheduled_events'
  TBLPROPERTIES (
    'avro.schema.url'='hdfs:////hadoop/raw/omega/schema/schedule_events.avsc',
   'transient_lastDdlTime'='1505742141')

Now to drop a particular partition I can run a ALTER command as below

 ALTER TABLE schedule_events DROP IF EXISTS PARTITION  (year='2016',month='06',day='01')
 Dropped the partition year=2016/month=06/day=01

 hive> show partitions schedule_events;
 OK
 year=2017/month=09/day=01
 year=2017/month=09/day=02
 year=2017/month=09/day=03
 year=2017/month=09/day=04
 year=2017/month=09/day=05

But this table is having many partitions.

How do I drop all existing partitions at once? I would like to delete all existing partitions at once? Is that possible?

like image 684
Surender Raja Avatar asked Sep 19 '17 18:09

Surender Raja


People also ask

How do I drop multiple partitions?

You can remove multiple partitions or subpartitions from a range or list partitioned table with the DROP PARTITION and DROP SUBPARTITION clauses of the SQL ALTER TABLE statement. For example, the following SQL statement drops multiple partitions from the range-partitioned table sales .

How do I delete old partitions in Hive?

Hive drop or delete partition is performed using ALTER TABLE tablename DROP command. Dropping a partition from a table removes the data from HDFS and from Hive Metastore. When dropping a partition that doesn't exist, it returns an error.

What is MSCK in Hive?

The MSCK REPAIR TABLE command scans a file system such as Amazon S3 for Hive compatible partitions that were added to the file system after the table was created. MSCK REPAIR TABLE compares the partitions in the table metadata and the partitions in S3.


1 Answers

There are multiple options, here is one:

alter table schedule_events drop if exists partition (year<>'');

Hive: Extend ALTER TABLE DROP PARTITION syntax to use all comparators

"... To drop a partition from a Hive table, this works:
ALTER TABLE foo DROP PARTITION(ds = 'date')
...but it should also work to drop all partitions prior to date.
ALTER TABLE foo DROP PARTITION(ds < 'date') This task is to implement ALTER TABLE DROP PARTITION for all of the comparators, < > <= >= <> = != instead of just for ="

https://issues.apache.org/jira/browse/HIVE-2908

like image 179
David דודו Markovitz Avatar answered Oct 16 '22 01:10

David דודו Markovitz