I wanted to run Python wheel as a Spark job using api/2.0/jobs/runs/submit API endpoint by providing package_name and entry_point:
{
'existing_cluster_id': self.cluster_id,
"python_wheel_task": {
"package_name": "my.package",
"entry_point": "my_method"
},
"libraries": [
{"whl": "dbfs:/FileStore/jars/1e023c35_ca3a_42c0_958b_fa308124ccc3/my_lib-0.0.1-py3-none-any.whl"}
]
}
However, when the request is being processed, Databricks is using %conda magic commands (used to manage Python package dependencies within a notebook scope using familiar pip and conda syntax) which are not supported on the standard Databricks Runtime (only Databricks Runtime ML v6.4+ supports it):
Py4JJavaError: An error occurred while calling t.getCondaEnvState.
: org.apache.spark.SparkException: Conda magic is only available on Databricks Runtime for Machine Learning.
at com.databricks.backend.daemon.driver.PythonDriverLocal$PythonEntryPointInterfaceImpl.getCondaEnvState(PythonDriverLocal.scala:265)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
Is there any other option to run Python wheel other than using spark_python_task combined with the script that imports wheel's entry point and runs it?
Thanks in advance.
I believe that this should be achievable by specification of the libraries field (see docs), although I don't remember how it will be handled on existing clusters (and can't check right now). Can you try something like this:
{
"existing_cluster_id": <cluster_id>,
"python_wheel_task": {
"package_name": <package_name>,
"entry_point": <entry_point>
},
"libraries": [
{ "whl": "dbfs:/FileStore/my-lib.whl" }
]
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With