Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to register Python function as UDF in SparkSQL in Java/Scala?

I have few very, very simple functions in Python that I would like to use as UDFs in Spark SQL. It seems easy to register and use them from Python. But I would like to use them from Java/Scala when using JavaSQLContext or SQLContext. I noted that in spark 1.2.1 there is function registerPython but it is neither clear to me how to use it nor whether I should ...

Any ideas on how to to do this? I think that it might got easier in 1.3.0 but I'm limited to 1.2.1.

EDIT: As no longer working on this, I'm interest in knowing how to do this in any Spark version.

like image 831
kkonrad Avatar asked Mar 19 '15 11:03

kkonrad


People also ask

How do I register a function as UDF in Spark Scala?

In Spark, you create UDF by creating a function in a language you prefer to use for Spark. For example, if you are using Spark with scala, you create a UDF in scala language and wrap it with udf() function or register it as udf to use it on DataFrame and SQL respectively.

What is UDF function in Scala?

Description. User-Defined Functions (UDFs) are user-programmable routines that act on one row. This documentation lists the classes that are required for creating and registering UDFs. It also contains examples that demonstrate how to define and register UDFs and invoke them in Spark SQL.


1 Answers

from pyspark.sql import *
from pyspark.sql.types import *
from pyspark.sql import SQLContext

def dummy_function(parameter_key):
    return "abc"

sqlContext.udf.register("dummy_function", dummy_function)

This is how we can define a function and register to use in any spark-sql query

like image 163
Harshal Taware Avatar answered Oct 13 '22 11:10

Harshal Taware