Update , SET option in Hive

Tags:

hive

I know there is no update of file in Hadoop but in Hive it is possible with syntactic sugar to merge the new values with the old data in the table and then to rewrite the table with the merged output but if I have the new values in another table then I can achieve the same effect by using a left outer join.

The problem in my case is that I have to update the table by setting one value to a column with some where condition. It is known that SET is not supported.

For example, consider the below normal query:

UPDATE temp1
SET location=florida
WHERE id=206;

I tried to convert the same in Hive but I got stuck in the place of SET. If anyone let me know then it would be a great help for me.

641

asked Jun 01 '11 12:06

Jothi

2 Answers

INSERT OVERWRITE TABLE _tableName_ PARTITION (_partitionColumn_= _partitionValue_) 
SELECT [other Things], CASE WHEN id=206 THEN 'florida' ELSE location END AS location, [other Other Things] 
FROM _tableName_ WHERE [_whereClause_];

You can have multiple partitions listed by separating them by commas. ... PARTITION (_partitionColumn_= _partitionValue1_, _partitionColumn_= _partitionValue2_, ...). I haven't done this with multiple partitions, just one at a time, so I'd check the results on a test/dev env before doing all partitions at once. I had other reasons for limiting each OVERWRITE to a single partition as well.

This page https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML has a little more on it.
This site https://cwiki.apache.org/confluence/display/Hive/LanguageManual, in general, is your best friend when working with HiveSQL.

I've developed something identical to this to migrate some data and it's worked. I haven't tried it against large datasets, only a few GB and it has worked perfectly.

To Note - This will OVERWRITE the partition. It will make previous files go bye-bye. Create backup and restore scripts/procedures. The [other Things] and [other Other Things] are the rest of the columns from the table. They need to be in the correct order. This is very important or else your data will be corrupted.

Hope this helps. :)

175

answered Sep 27 '22 18:09

QuinnG

This may be hacky but it's worked for somethings I've had to do at work.

    INSERT OVERWRITE TABLE tabletop PARTITION(partname = 'valueIwantToSet')
    SELECT things FROM databases.tables WHERE whereclause;

As you might expect this breaks your data up into partitions, but if the distribution of the value you want to set is proportional to "good data chunk sizes" (this is up to you to design) then your queries on that data will be better optimized

@Jothi: Could you please post the query you used?

answered Sep 27 '22 18:09

Engineiro

Related questions
                            
                                Row number functionality in Hive
                            
                                Finding the first row in a group using Hive
                            
                                What does "WITH SERDEPROPERTIES ( 'paths' = 'key1, key2, key3') " really do in Hive DDL json serde?
                            
                                What is the meaning of 'serialization.format' property of a table in hive
                            
                                Athena create table from parquet schema
                            
                                Hive service, HiveServer2 & MetaStore service?
                            
                                Attempt to do update or delete using transaction manager that does not support these operations
                            
                                beeline not able to connect to hiveserver2
                            
                                Using Hadoop, are my reducers guaranteed to get all the records with the same key?
                            
                                how to select data from hive with specific partition?
                            
                                Dynamic Partitioning + CREATE AS on HIVE
                            
                                Hive How to select all but one column?
                            
                                Is LIMIT clause in HIVE really random?
                            
                                Difference between `load data inpath ` and `location` in hive?
                            
                                Hive creating a table but getting FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
                            
                                Drop all partitions from a hive table?
                            
                                How do I drop all partitions at once in hive?
                            
                                How to delete duplicate records from Hive table?
                            
                                Combine multiple rows into one space separated string
                            
                                How to find which Java/Scala thread has locked a file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With