Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alter hive table add or drop column

Tags:

hadoop

hive

I have orc table in hive I want to drop column from this table

ALTER TABLE table_name drop  col_name;

but I am getting the following exception

Error occurred executing hive query: OK FAILED: ParseException line 1:35 mismatched input 'user_id1' expecting PARTITION near 'drop' in drop partition statement

Can any one help me or provide any idea to do this? Note, I am using hive 0.14

like image 225
Aryan Singh Avatar asked Dec 10 '15 09:12

Aryan Singh


People also ask

Can we drop a column in Hive table?

Delete or Replace Column Hive allows us to delete one or more columns by replacing them with the new columns. Thus, we cannot drop the column directly. Let's see the existing schema of the table.

How do I add a column to an existing Hive table?

Yes, we can add column inside a table in Hive using a command: ALTER TABLE table_name ADD COLUMNS (column _name datatypes);

How do I drop one column in Hive?

The only way to drop column is using replace command. Lets say, I have a table TEST with id, name and case column. We want to drop id column of table TEST. So provide all those columns which you want to be the part of table in replace columns clause.

Can we ALTER TABLE in Hive?

Alter Table Statement. It is used to alter a table in Hive.


3 Answers

You cannot drop column directly from a table using command ALTER TABLE table_name drop col_name;

The only way to drop column is using replace command. Lets say, I have a table emp with id, name and dept column. I want to drop id column of table emp. So provide all those columns which you want to be the part of table in replace columns clause. Below command will drop id column from emp table.

 ALTER TABLE emp REPLACE COLUMNS( name string, dept string);
like image 140
Reena Upadhyay Avatar answered Sep 28 '22 17:09

Reena Upadhyay


There is also a "dumb" way of achieving the end goal, is to create a new table without the column(s) not wanted. Using Hive's regex matching will make this rather easy.

Here is what I would do:

-- make a copy of the old table
ALTER TABLE table RENAME TO table_to_dump;

-- make the new table without the columns to be deleted
CREATE TABLE table AS
SELECT `(col_to_remove_1|col_to_remove_2)?+.+`
FROM table_to_dump;

-- dump the table 
DROP TABLE table_to_dump;

If the table in question is not too big, this should work just well.

like image 8
ccy Avatar answered Sep 28 '22 16:09

ccy


suppose you have an external table viz. organization.employee as: (not including TBLPROPERTIES)

hive> show create table organization.employee;
OK
CREATE EXTERNAL TABLE `organization.employee`(
      `employee_id` bigint,
      `employee_name` string,
      `updated_by` string,
      `updated_date` timestamp)
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
    STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
    OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
    LOCATION
      'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'

You want to remove updated_by, updated_date columns from the table. Follow these steps:

create a temp table replica of organization.employee as:

hive> create table organization.employee_temp as select * from organization.employee;

drop the main table organization.employee.

hive> drop table organization.employee;

remove the underlying data from HDFS (need to come out of hive shell)

[nameet@ip-80-108-1-111 myfile]$ hadoop fs -rm hdfs://getnamenode/apps/hive/warehouse/organization.db/employee/*

create the table with removed columns as required:

hive> CREATE EXTERNAL TABLE `organization.employee`(
  `employee_id` bigint,
  `employee_name` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'

insert the original records back into original table.

hive> insert into organization.employee 
select employee_id, employee_name from organization.employee_temp;

finally drop the temp table created

hive> drop table organization.employee_temp;
like image 5
Nameet Nayan Avatar answered Sep 28 '22 18:09

Nameet Nayan