Is it possible to submit/configure a spark python script (.py) file to databricks job?
I have my developments happening in my Pycharm IDE, then push/commit the code to our gitlab repository. My requirement is I need to create new jobs in databricks cluster as and when a python script is moved to a GitLab master branch.
I would like to get some suggestions if its possible to create a databricks job on a python script, using gitlab.yml scripts?
In databricks Job UI, I could see spark jar or a notebook that can be used, but wondering if we can provide a python file.
Thanks,
Yuva
This functionality is not currently available in the Databricks UI, but it is accessible via the REST API. You'll want to use the SparkPythonTask data structure.
You'll find this example in the official documentation:
curl -n -H "Content-Type: application/json" -X POST -d @- https://<databricks-instance>/api/2.0/jobs/create <<JSON
{
"name": "SparkPi Python job",
"new_cluster": {
"spark_version": "5.2.x-scala2.11",
"node_type_id": "i3.xlarge",
"num_workers": 2
},
"spark_python_task": {
"python_file": "dbfs:/docs/pi.py",
"parameters": [
"10"
]
}
}JSON
If you need help getting started with the REST API, see here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With