How to delete duplicate records from Hive table?

Tags:

hive

I am trying to learn about deleting duplicate records from a Hive table.

My Hive table: 'dynpart' with columns: Id, Name, Technology

Id  Name  Technology
1   Abcd  Hadoop
2   Efgh  Java
3   Ijkl  MainFrames
2   Efgh  Java

We have options like 'Distinct' to use in a select query, but a select query just retrieves data from the table. Could anyone tell how to use a delete query to remove the duplicate rows from a Hive table.

Sure that it is not recommended or not the standard to Delete/Update records in Hive. But I want to learn how do we do it.

613

asked Apr 07 '17 13:04

Metadata

1 Answers

You can use insert overwrite statement to update data

insert overwrite table dynpart select distinct * from dynpart;

118

answered Oct 12 '22 17:10

fi11er

Related questions
                            
                                is it possible to use apache mahout without hadoop dependency?
                            
                                how to select data from hive with specific partition?
                            
                                Exception: java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment. in spark
                            
                                Dynamic Partitioning + CREATE AS on HIVE
                            
                                Hbase client can't connect to remote Hbase server
                            
                                Spark Driver memory and Application Master memory
                            
                                Difference between `load data inpath ` and `location` in hive?
                            
                                Meaning of fs.defaultFS property in core-site.xml in hadoop
                            
                                How to use two versions of spark shell?
                            
                                Repair HBase table (unassigned region in transition)
                            
                                get "ERROR: Can't get master address from ZooKeeper; znode data == null" when using Hbase shell
                            
                                Get a yarn configuration from commandline
                            
                                Spark: Inconsistent performance number in scaling number of cores
                            
                                On what basis mapreduce framework decides whether to launch a combiner or not
                            
                                Could not find or load main class org.apache.hadoop.hdfs.server.namenode.Namenode
                            
                                no namenode error in pseudo-mode
                            
                                Hive creating a table but getting FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
                            
                                High throughput vs low latency in HDFS
                            
                                terminating a spark step in aws
                            
                                Hadoop: compress file in HDFS?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With