Alter hive table add or drop column

Tags:

hive

I have orc table in hive I want to drop column from this table

ALTER TABLE table_name drop  col_name;

but I am getting the following exception

Error occurred executing hive query: OK FAILED: ParseException line 1:35 mismatched input 'user_id1' expecting PARTITION near 'drop' in drop partition statement

Can any one help me or provide any idea to do this? Note, I am using hive 0.14

225

asked Dec 10 '15 09:12

3 Answers

You cannot drop column directly from a table using command ALTER TABLE table_name drop col_name;

The only way to drop column is using replace command. Lets say, I have a table emp with id, name and dept column. I want to drop id column of table emp. So provide all those columns which you want to be the part of table in replace columns clause. Below command will drop id column from emp table.

 ALTER TABLE emp REPLACE COLUMNS( name string, dept string);

140

answered Sep 28 '22 17:09

Reena Upadhyay

There is also a "dumb" way of achieving the end goal, is to create a new table without the column(s) not wanted. Using Hive's regex matching will make this rather easy.

Here is what I would do:

-- make a copy of the old table
ALTER TABLE table RENAME TO table_to_dump;

-- make the new table without the columns to be deleted
CREATE TABLE table AS
SELECT `(col_to_remove_1|col_to_remove_2)?+.+`
FROM table_to_dump;

-- dump the table 
DROP TABLE table_to_dump;

If the table in question is not too big, this should work just well.

answered Sep 28 '22 16:09

ccy

suppose you have an external table viz. organization.employee as: (not including TBLPROPERTIES)

hive> show create table organization.employee;
OK
CREATE EXTERNAL TABLE `organization.employee`(
      `employee_id` bigint,
      `employee_name` string,
      `updated_by` string,
      `updated_date` timestamp)
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
    STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
    OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
    LOCATION
      'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'

You want to remove updated_by, updated_date columns from the table. Follow these steps:

create a temp table replica of organization.employee as:

hive> create table organization.employee_temp as select * from organization.employee;

drop the main table organization.employee.

hive> drop table organization.employee;

remove the underlying data from HDFS (need to come out of hive shell)

[nameet@ip-80-108-1-111 myfile]$ hadoop fs -rm hdfs://getnamenode/apps/hive/warehouse/organization.db/employee/*

create the table with removed columns as required:

hive> CREATE EXTERNAL TABLE `organization.employee`(
  `employee_id` bigint,
  `employee_name` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'

insert the original records back into original table.

hive> insert into organization.employee 
select employee_id, employee_name from organization.employee_temp;

finally drop the temp table created

hive> drop table organization.employee_temp;

answered Sep 28 '22 18:09

Nameet Nayan

Related questions
                            
                                hdfs dfs -mkdir, No such file or directory
                            
                                How to load a text file into a Hive table stored as sequence files
                            
                                $HADOOP_HOME is deprecated
                            
                                Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database
                            
                                Apache Hadoop Yarn - Underutilization of cores
                            
                                What is the purpose of "uber mode" in hadoop?
                            
                                Find port number where HDFS is listening
                            
                                Is there an equivalent to `pwd` in hdfs?
                            
                                how to replace characters in hive?
                            
                                Pyspark: get list of files/directories on HDFS path
                            
                                No such method exception Hadoop <init>
                            
                                Accessing stream output from hdfs of MRjob
                            
                                Add a column in a table in HIVE QL
                            
                                Difference between `hadoop dfs` and `hadoop fs` [closed]
                            
                                How to convert .txt file to Hadoop's sequence file format
                            
                                Hadoop speculative task execution
                            
                                Select top 2 rows in Hive
                            
                                apache spark - check if file exists
                            
                                Why do I need to source bash_profile every time
                            
                                Would Spark unpersist the RDD itself when it realizes it won't be used anymore?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Alter hive table add or drop column

Tags:

hadoop

hive

Aryan Singh

People also ask

3 Answers

Reena Upadhyay

ccy

Nameet Nayan

Recent Activity

Donate For Us