Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

storing pig output into Hive table in a single instance

I would like to insert the pig output into Hive tables(tables in Hive is already created with the exact schema).Just need to insert the output values into table. I dont want to the usual method, wherein I first store into a file, then read that file from Hive and then insert into tables. I need to reduce that extra hop which is done.

Is it possible. If so please tell me how this can be done ?

Thanks

like image 511
Kirthika Avatar asked Jul 08 '15 09:07

Kirthika


People also ask

Which function is used to store the output in Pig?

Pig's store function is, in many ways, a mirror image of the load function. It is built on top of Hadoop's OutputFormat . It takes Pig Tuple s and creates key-value pairs that its associated output format writes to storage.

Can we insert data into Hive external table?

Hive provides multiple ways to add data to the tables. We can use DML(Data Manipulation Language) queries in Hive to import or add data to the table. One can also directly put the table into the hive with HDFS commands.

How do you load and write Hive table inside a Pig program?

First start hive CLI, then create and load data into table “profits” which is under bdp schema. After executing below queries, verify that data is loaded successfully. Use the below command to create a table: CREATE SCHEMA IF NOT EXISTS bdp; CREATE TABLE bdp.


1 Answers

Ok. Create a external hive table with a schema layout somewhere in HDFS directory. Lets say

create external table emp_records(id int,
                                  name String,
                                  city String)
                                  row formatted delimited 
                                  fields terminated by '|'
                                  location '/user/cloudera/outputfiles/usecase1';

Just create a table like above and no need to load any file into that directory.

Now write a Pig script that we read data for some input directory and then when you store the output of that Pig script use as below

A =  LOAD 'inputfile.txt' USING PigStorage(',') AS(id:int,name:chararray,city:chararray);
B = FILTER A by id > = 678933;
C = FOREACH B GENERATE id,name,city;
STORE C INTO '/user/cloudera/outputfiles/usecase1' USING PigStorage('|');

Ensure that destination location and delimiter and schema layout of final FOREACH statement in you Pigscript matches with Hive DDL schema.

like image 52
Surender Raja Avatar answered Oct 27 '22 16:10

Surender Raja