Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Submit Python script to Databricks JOB

Is it possible to submit/configure a spark python script (.py) file to databricks job?

I have my developments happening in my Pycharm IDE, then push/commit the code to our gitlab repository. My requirement is I need to create new jobs in databricks cluster as and when a python script is moved to a GitLab master branch.

I would like to get some suggestions if its possible to create a databricks job on a python script, using gitlab.yml scripts?

In databricks Job UI, I could see spark jar or a notebook that can be used, but wondering if we can provide a python file.

Thanks,

Yuva

like image 351
Yuva Avatar asked Oct 20 '25 01:10

Yuva


1 Answers

This functionality is not currently available in the Databricks UI, but it is accessible via the REST API. You'll want to use the SparkPythonTask data structure.

You'll find this example in the official documentation:

curl -n -H "Content-Type: application/json" -X POST -d @- https://<databricks-instance>/api/2.0/jobs/create <<JSON
{
  "name": "SparkPi Python job",
  "new_cluster": {
    "spark_version": "5.2.x-scala2.11",
    "node_type_id": "i3.xlarge",
    "num_workers": 2
  },
  "spark_python_task": {
    "python_file": "dbfs:/docs/pi.py",
    "parameters": [
      "10"
    ]
  }
}JSON

If you need help getting started with the REST API, see here.

like image 188
Raphael K Avatar answered Oct 22 '25 20:10

Raphael K



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!