Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Databricks - Create Function (UDF) in Python

How can I create a function like that https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html#create-function but defining the function in python?

I already did something like that:

from pyspark.sql.types import IntegerType
def relative_month(input_date):
  if input_date is not None:
    return ((input_date.month + 2) % 6)+1
  else:
    return None
_ = spark.udf.register("relative_month", relative_month, IntegerType())

But this UDF only works for the notebook that runs this piece of code.

I want to do the same thing using a SQL syntax to register the function because I will have some users using databricks trough SQL Clients and they will need the functions too.

In the Databricks docs says that i can define a resource:

: (JAR|FILE|ARCHIVE) file_uri

I need to create a .py file and put it somewhere in my databricks cluster?

like image 645
Rafael Leinio Avatar asked Mar 07 '26 08:03

Rafael Leinio


1 Answers

To share notebooks, set spark.databricks.session.share to true in the cluster’s configuration. Normally UDF's are application specific in spark and temporary so if one has to use it in other application , they have to register it again for using it. But as i said if you set the spark.databricks.session.share to true , you can share it across multiple notebook.

If it is for HIVE then you can register the UDF permanantly and can be accessible across multiple user's

Here is a similar thread for the same.See if it helps.

Databricks - Creating permanent User Defined Functions (UDFs)

like image 102
Mohit Verma Avatar answered Mar 08 '26 22:03

Mohit Verma