Several people (1, 2, 3) have discussed using a Scala UDF in a PySpark application, usually for performance reasons. I am interested in the opposite - using a python UDF in a Scala Spark project.
I am particularly interested in building a model using sklearn (and MLFlow) then efficiently applying that to records in a Spark streaming job. I know I could also host the python model behind a REST API and make calls to that API in the Spark streaming application in mapPartitions
, but managing concurrency for that task and setting up the API for hosted model isn't something I'm super excited about.
Is this possible without too much custom development with something like Py4J? Is this just a bad idea?
Thanks!
In Spark, you create UDF by creating a function in a language you prefer to use for Spark. For example, if you are using Spark with scala, you create a UDF in scala language and wrap it with udf() function or register it as udf to use it on DataFrame and SQL respectively.
It is well known that the use of UDFs (User Defined Functions) in Apache Spark, and especially in using the Python API, can compromise our application performace. For this reason, at Damavis we try to avoid their use as much as possible infavour of using native functions or SQL .
Maybe I'm late to the party, but at least I can help with this for posterity. This is actually achievable by creating your python udf
and registering it with spark.udf.register("my_python_udf", foo)
. You can view the doc here https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.UDFRegistration.register
This function can then be called from sqlContext
in Python, Scala, Java, R or any language really, because you're accessing sqlContext
directly (where the udf
is registered). For example, you would call something like
spark.sql("SELECT my_python_udf(...)").show()
PROS - You get to call your sklearn
model from Scala.
CONS - You have to use sqlContext
and write SQL
style queries.
I hope this helps, at least for any future visitors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With