I'm using Azure Databricks for data processing, with notebooks and pipeline.
I'm not satisfied with my current workflow:
Great question. Definitely dont modify your production code in place.
One recommended pattern is to keep separate folders in your workspace for dev-staging-prod. Do your dev work and then run tests in staging before finally promoting to production.
You can use the Databricks CLI to pull and push a notebook from one folder to another without breaking existing code. Going one step further, you can incorporate this pattern with git to sync with version control. In either case, the CLI gives you programmatic access to the workspace and that should make it easier to update code for production jobs.
Regarding your second point about IDEs - Databricks offers Databricks Connect, which let's you use your IDE while running commands on a cluster. Based on your pain points I think this is a great solution for you, as it will give your more visibility into the functions you have defined and so on. You can also write and run your unit tests this way.
Once you have your scripts ready to go you can always import them into the workspace as a notebook and run it as a job. Also know that you can run .py scripts as a job using the REST API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With