Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the consequences of adding a column to an existing HIVE table?

Suppose that a couple hundred Gigs after starting to use HIVE I want to add a column. From the various articles & pages I have seen, I cannot understand the consequences in terms of

  • storage space required (double ?)
  • blocking (can I still read the table in other processes) ?
  • time (is it quick or as slow as a MysqL change ?)
  • underlying storage (do I need to change all the underlying files ? How can it be done using RCFile ?)

Bonus to whoever can answer the same question on structs in a HIVE column.

like image 245
Philippe Girolami Avatar asked Feb 21 '11 12:02

Philippe Girolami


People also ask

Can we add column to the existing table in Hive?

Yes, we can add column inside a table in Hive using a command: ALTER TABLE table_name ADD COLUMNS (column _name datatypes);

How do I add a column to a specific position in Hive?

ALTER TABLE table_name ADD COLUMNS (user_id BIGINT); Now to make user_id column as the first column in your table use change column with FIRST clause: ALTER TABLE table_name CHANGE COLUMN user_id user_id BIGINT first; This will move the user_id column to the first position.

Can a Hive table contain data in more than one format?

Hive expects all the files for one table to use the same delimiter, same compression applied etc. So, you cannot use a Hive table on top of files with multiple formats. Create a view for the UNION of the 3 tables created above.

How do I remove a column from a table in Hive?

There is no delete column in hive. however the below commands can work. CREATE TABLE test_change (a int, b int, c int); ALTER TABLE test_change REPLACE COLUMNS (a int, b int);” will remove column 'c' from test_change's schema.


1 Answers

If you add a column to a hive table, only the underlying metastore is updated.

  • The required storage space is not increased as long as you do not add data
  • The change can be made while other processes are accessing the table
  • The change is very quick (only the underlying metastore is updated)
  • You do not have to change the underlying files. Existing records have the value null for the new column

I hope this helps.

like image 81
Helmut Zechmann Avatar answered Oct 12 '22 02:10

Helmut Zechmann