I have few very, very simple functions in Python that I would like to use as UDFs in Spark SQL. It seems easy to register and use them from Python. But I would like to use them from Java/Scala when using JavaSQLContext or SQLContext. I noted that in spark 1.2.1 there is function registerPython but it is neither clear to me how to use it nor whether I should ...
Any ideas on how to to do this? I think that it might got easier in 1.3.0 but I'm limited to 1.2.1.
EDIT: As no longer working on this, I'm interest in knowing how to do this in any Spark version.
In Spark, you create UDF by creating a function in a language you prefer to use for Spark. For example, if you are using Spark with scala, you create a UDF in scala language and wrap it with udf() function or register it as udf to use it on DataFrame and SQL respectively.
Description. User-Defined Functions (UDFs) are user-programmable routines that act on one row. This documentation lists the classes that are required for creating and registering UDFs. It also contains examples that demonstrate how to define and register UDFs and invoke them in Spark SQL.
from pyspark.sql import *
from pyspark.sql.types import *
from pyspark.sql import SQLContext
def dummy_function(parameter_key):
return "abc"
sqlContext.udf.register("dummy_function", dummy_function)
This is how we can define a function and register to use in any spark-sql query
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With