Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I pass parameters to UDFs in Pig script?

Tags:

apache-pig

I am relatively new to PigScript. I would like to know if there is a way of passing parameters to Java UDFs in Pig?

Here is the scenario: I have a log file which have different columns (each representing a Primary Key in another table). My task is to get the count of distinct primary key values in the selected column. I have written a Pig script which does the job of getting the distinct primary keys and counting them. However, I am now supposed to write a new UDF for each column. Is there a better way to do this? Like if I can pass a row number as parameter to UDF, it avoids the need for me writing multiple UDFs.

like image 360
emkay Avatar asked Oct 31 '12 17:10

emkay


People also ask

What are the various types of UDFs in Pig?

Pig provides extensive support for user defined functions (UDFs) as a way to specify custom processing. Pig UDFs can currently be implemented in six languages: Java, Jython, Python, JavaScript, Ruby and Groovy. The most extensive support is provided for Java functions.

Which operator accepts value for input variable in Pig UDF?

In FOREACH GENERATE statements, we use the Eval functions. Basically, it accepts a Pig value as input and returns a Pig result.

What is UDF in Pig or Hive?

To specify custom processing, Pig provides support for user-defined functions (UDFs). Thus, Pig allows us to create our own functions. Currently, Pig UDFs can be implemented using the following programming languages: - Java.

Which of the following is not true in Pig?

Which of the following is not true about Pig? B. Pig can not perform all the data manipulation operations in Hadoop.


2 Answers

The way to do it is by using DEFINE and the constructor of the UDF. So here is an example of a customer "splitter":

REGISTER com.sample.MyUDFs.jar;
DEFINE CommaSplitter com.sample.MySplitter(',');

B = FOREACH A GENERATE f1, CommaSplitter(f2);

Hopefully that conveys the idea.

like image 89
troutinator Avatar answered Oct 22 '22 06:10

troutinator


To pass parameters you do the following in your pigscript:

UDF(document, '$param1', '$param2', '$param3')

edit: Not sure if those params need to be wrappedin ' ' or not

while in your UDF you do:

public class UDF extends EvalFunc<Boolean> {



public Boolean exec(Tuple input) throws IOException {

    if (input == null || input.size() == 0)
        return false;

    FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());


    String var1 = input.get(1).toString();
    InputStream var1In = fs.open(new Path(var1));


    String var2 = input.get(2).toString();
    InputStream var2In = fs.open(new Path(var2));

    String var3 = input.get(3).toString();
    InputStream var3In = fs.open(new Path(var3));



    return doyourthing(input.get(0).toString());
}
}

for example

like image 29
Havnar Avatar answered Oct 22 '22 06:10

Havnar