Add a column in a table in HIVE QL

Tags:

I'm writing a code in HIVE to create a table consisting of 1300 rows and 6 columns:

create table test1 as SELECT cd_screen_function,      SUM(access_count) AS max_count,      MIN(response_time_min) as response_time_min,      AVG(response_time_avg) as response_time_avg,      MAX(response_time_max) as response_time_max,      SUM(response_time_tot) as response_time_tot,      COUNT(*) as row_count      FROM sheet WHERE  ts_update BETWEEN unix_timestamp('2012-11-01 00:00:00') AND       unix_timestamp('2012-11-30 00:00:00') and cd_office = '016'      GROUP BY cd_screen_function ORDER BY max_count DESC, cd_screen_function;

Now I want to add another column as access_count1 which consists one unique value for all 1300 rows and value will be sum(max_count). max_count is a column in my existing table. How I can do that? I am trying to alter the table by this code ALTER TABLE test1 ADD COLUMNS (access_count1 int) set default sum(max_count);

262

asked Oct 25 '13 12:10

user2532312

1 Answers

You cannot add a column with a default value in Hive. You have the right syntax for adding the column ALTER TABLE test1 ADD COLUMNS (access_count1 int);, you just need to get rid of default sum(max_count). No changes to that files backing your table will happen as a result of adding the column. Hive handles the "missing" data by interpreting NULL as the value for every cell in that column.

So now your have the problem of needing to populate the column. Unfortunately in Hive you essentially need to rewrite the whole table, this time with the column populated. It may be easier to rerun your original query with the new column. Or you could add the column to the table you have now, then select all of its columns plus value for the new column.

You also have the option to always COALESCE the column to your desired default and leave it NULL for now. This option fails when you want NULL to have a meaning distinct from your desired default. It also requires you to depend on always remembering to COALESCE.

If you are very confident in your abilities to deal with the files backing Hive, you could also directly alter them to add your default. In general I would recommend against this because most of the time it will be slower and more dangerous. There might be some case where it makes sense though, so I've included this option for completeness.

answered Oct 13 '22 14:10

Daniel Koverman

Related questions
                            
                                Python read file as stream from HDFS
                            
                                Pig Latin: Load multiple files from a date range (part of the directory structure)
                            
                                Working With Hadoop: localhost: Error: JAVA_HOME is not set
                            
                                What is a keytab exactly?
                            
                                How to Define Custom partitioner for Spark RDDs of equally sized partition where each partition has equal number of elements?
                            
                                How do I run graphx with Python / pyspark?
                            
                                What is hive, Is it a database? [closed]
                            
                                Set hadoop system user for client embedded in Java webapp
                            
                                hdfs dfs -mkdir, No such file or directory
                            
                                How to load a text file into a Hive table stored as sequence files
                            
                                $HADOOP_HOME is deprecated
                            
                                Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database
                            
                                Apache Hadoop Yarn - Underutilization of cores
                            
                                What is the purpose of "uber mode" in hadoop?
                            
                                Find port number where HDFS is listening
                            
                                Is there an equivalent to `pwd` in hdfs?
                            
                                how to replace characters in hive?
                            
                                Pyspark: get list of files/directories on HDFS path
                            
                                No such method exception Hadoop <init>
                            
                                Accessing stream output from hdfs of MRjob

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Add a column in a table in HIVE QL

Tags:

hadoop

hive

hiveql

user2532312

People also ask

1 Answers

Daniel Koverman

Recent Activity

Donate For Us