Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop Pig: Passing Command Line Arguments

Tags:

Is there a way to do this? eg, pass the name of the file to be processed, etc?

like image 821
downer Avatar asked Nov 12 '10 15:11

downer


People also ask

Which command can be used to execute pig script with parameter file?

Similar to regular Pig parameter substitution, you can define parameters using -param/–param_file on Pig's command line. This variable will be treated as one of the binding variables when binding the Pig Latin script. For example, you can invoke the below Python script using: pig –param loadfile=student. txt script.py.

How can we run the HDFS commands in pig?

First you need to go ahead and load the data file in PigStorage. product = LOAD 'hdfs://localhost:9000/product_dir/products.csv' USING PigStorage(',') as (product_id:int, product_name:chararray, price:int); dump product; Next you can go ahead and execute the script file, which is stored in HDFS.

What is the difference between exec and run commands in pig?

Unlike the run command, exec does not change the command history or remembers the handles used inside the script. Exec without any parameters can be used in scripts to force execution up to the point in the script where the exec occurs.


1 Answers

This showed up in another question, but you can indicate the input parameter on the command line and use that when you are loading, for example:

Command Line:

pig -f script.pig -param input=somefile.txt

script.pig:

raw = LOAD '$input' AS (...);

Note that if you are using the Amazon Web Services Elastic Map Reduce then the '$input' is what is passed to the script for any input you provide.

like image 169
rjzii Avatar answered Oct 02 '22 13:10

rjzii