Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing data to SequenceFile from Apache Pig

Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader:

REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar;

DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();

log = LOAD '/data/logs' USING SequenceFileLoader AS (...)

Is there also a library out there that would allow writing to Hadoop sequence files from Pig?

like image 531
asquithea Avatar asked Mar 11 '10 09:03

asquithea


People also ask

What are different modes of execution in Apache Pig?

Apache Pig scripts can be executed in three ways, namely, interactive mode, batch mode, and embedded mode. Interactive Mode (Grunt shell) − You can run Apache Pig in interactive mode using the Grunt shell. In this shell, you can enter the Pig Latin statements and get the output (using Dump operator).

What is flatten in pig?

The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the structure of tuples and bags in a way that a UDF cannot. Flatten un-nests tuples as well as bags. The idea is the same, but the operation and result is different for each type of structure.

How do you transfer data from local to pig?

Now load the data from the file student_data. txt into Pig by executing the following Pig Latin statement in the Grunt shell. grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' USING PigStorage(',') as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );


1 Answers

It's just a matter of implementing a StoreFunc to do so.

This is possible now, although it will become a fair bit easier once Pig 0.7 comes out, as it includes a complete redesign of the Load/Store interfaces.

The "Hadoop expansion pack" Twitter is about to open source open-sourced at github, includes code for generating Load and Store funcs based on Google Protocol Buffers (building on Input/Output formats for same -- you already have those for sequence files, obviously). Check it out if you need examples of how to do some of the less trivial stuff. It should be fairly straightforward though.

like image 83
SquareCog Avatar answered Oct 13 '22 19:10

SquareCog