Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Append new data to already existing hive table

Tags:

hadoop

hive

How to append the records to existing partitioned Hive table? For example I have existing external Table called "ip_country" and dataset is testdata1. If dataset grows say like my dataset in next day is testdata1 and testdata2 then how to append new data i.e.., "testdata2" to "ip_country" hive table.

like image 550
marjun Avatar asked May 13 '15 10:05

marjun


People also ask

How do I add data to an existing table in Hive?

Hive provides multiple ways to add data to the tables. We can use DML(Data Manipulation Language) queries in Hive to import or add data to the table. One can also directly put the table into the hive with HDFS commands. In case we have data in Relational Databases like MySQL, ORACLE, IBM DB2, etc.

Can we update the data inside a Hive table?

You use the UPDATE statement to modify data already stored in an Apache Hive table. You construct an UPDATE statement using the following syntax: UPDATE tablename SET column = value [, column = value ...]

How do you overwrite data in Hive table?

The INSERT OVERWRITE DIRECTORY with Hive format overwrites the existing data in the directory with the new values using Hive SerDe . Hive support must be enabled to use this command. The inserted rows can be specified by value expressions or result from a query.

How do I add a column to an existing Hive table?

Yes, we can add column inside a table in Hive using a command: ALTER TABLE table_name ADD COLUMNS (column _name datatypes); I hope this will work.


1 Answers

It can be achieved in couple of ways (Purely depends on your requirement)

  1. If you don't bother about overwriting the existing records in the partition, (I mean you don't have a big history data, say 10 yrs data), then Insert Overwrite might fit.

INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;

  1. If you don't bother about duplicates in the partition, then Insert Into might fit (Honestly I wudn't prefer to have duplicate records).

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;

  1. If you have history data plus Incremental data, then History data can be inserted once and the incremental data(based on the frequency that you choose daily/weekly/fortnightly basis) can be inserted using a Insert Overwrite
like image 166
Partha Kaushik Avatar answered Nov 15 '22 06:11

Partha Kaushik