Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add a column in a table in HIVE QL

I'm writing a code in HIVE to create a table consisting of 1300 rows and 6 columns:

create table test1 as SELECT cd_screen_function,      SUM(access_count) AS max_count,      MIN(response_time_min) as response_time_min,      AVG(response_time_avg) as response_time_avg,      MAX(response_time_max) as response_time_max,      SUM(response_time_tot) as response_time_tot,      COUNT(*) as row_count      FROM sheet WHERE  ts_update BETWEEN unix_timestamp('2012-11-01 00:00:00') AND       unix_timestamp('2012-11-30 00:00:00') and cd_office = '016'      GROUP BY cd_screen_function ORDER BY max_count DESC, cd_screen_function; 

Now I want to add another column as access_count1 which consists one unique value for all 1300 rows and value will be sum(max_count). max_count is a column in my existing table. How I can do that? I am trying to alter the table by this code ALTER TABLE test1 ADD COLUMNS (access_count1 int) set default sum(max_count);

like image 262
user2532312 Avatar asked Oct 25 '13 12:10

user2532312


People also ask

How do you add a column to a table?

Add a column to the left or right Under Table Tools, on the Layout tab, do one of the following: To add a column to the left of the cell, click Insert Left in the Rows and Columns group. To add a column to the right of the cell, click Insert Right in the Rows and Columns group.

How do I add a column to a specific position in Hive?

ALTER TABLE table_name ADD COLUMNS (user_id BIGINT); Now to make user_id column as the first column in your table use change column with FIRST clause: ALTER TABLE table_name CHANGE COLUMN user_id user_id BIGINT first; This will move the user_id column to the first position.

How do you update columns in Hive?

You use the UPDATE statement to modify data already stored in an Apache Hive table. You construct an UPDATE statement using the following syntax: UPDATE tablename SET column = value [, column = value ...]

Can we ALTER TABLE in Hive?

Hive provides us the functionality to perform Alteration on the Tables and Databases. ALTER TABLE command can be used to perform alterations on the tables. We can modify multiple numbers of properties associated with the table schema in the Hive.


1 Answers

You cannot add a column with a default value in Hive. You have the right syntax for adding the column ALTER TABLE test1 ADD COLUMNS (access_count1 int);, you just need to get rid of default sum(max_count). No changes to that files backing your table will happen as a result of adding the column. Hive handles the "missing" data by interpreting NULL as the value for every cell in that column.

So now your have the problem of needing to populate the column. Unfortunately in Hive you essentially need to rewrite the whole table, this time with the column populated. It may be easier to rerun your original query with the new column. Or you could add the column to the table you have now, then select all of its columns plus value for the new column.

You also have the option to always COALESCE the column to your desired default and leave it NULL for now. This option fails when you want NULL to have a meaning distinct from your desired default. It also requires you to depend on always remembering to COALESCE.

If you are very confident in your abilities to deal with the files backing Hive, you could also directly alter them to add your default. In general I would recommend against this because most of the time it will be slower and more dangerous. There might be some case where it makes sense though, so I've included this option for completeness.

like image 70
Daniel Koverman Avatar answered Oct 13 '22 14:10

Daniel Koverman