Is there a way to do this? eg, pass the name of the file to be processed, etc?
Similar to regular Pig parameter substitution, you can define parameters using -param/–param_file on Pig's command line. This variable will be treated as one of the binding variables when binding the Pig Latin script. For example, you can invoke the below Python script using: pig –param loadfile=student. txt script.py.
First you need to go ahead and load the data file in PigStorage. product = LOAD 'hdfs://localhost:9000/product_dir/products.csv' USING PigStorage(',') as (product_id:int, product_name:chararray, price:int); dump product; Next you can go ahead and execute the script file, which is stored in HDFS.
Unlike the run command, exec does not change the command history or remembers the handles used inside the script. Exec without any parameters can be used in scripts to force execution up to the point in the script where the exec occurs.
This showed up in another question, but you can indicate the input parameter on the command line and use that when you are loading, for example:
Command Line:
pig -f script.pig -param input=somefile.txt
script.pig:
raw = LOAD '$input' AS (...);
Note that if you are using the Amazon Web Services Elastic Map Reduce then the '$input' is what is passed to the script for any input you provide.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With