I'm trying to set up GitHub integration for Databricks.
We have hundreds of notebooks there, and it would be exhausting to add every notebook manually to the repo.
Is there some way to automatically commit and push all notebooks from databricks to repository?
Clone a remote Git repository In the Add Repo dialog, click Clone remote Git repo and enter the repository URL. Select your Git provider from the drop-down menu, optionally change the name to use for the Databricks repo, and click Create. The contents of the remote repository are cloned to the Databricks repo.
Since no one answered it for 3 month, I'll put my own solution.
Under /Shared/
dir in databricks we have notebooks which should be synced to repository under notebooks/Shared/
.
I run this script on regular basis, thus keeping all notebooks up-to-date in a repo.
databricks workspace export_dir /Shared ./notebooks/Shared -o
git add --all
git commit -m "shared notebooks updated"
git push
-o
flag is for overriding existing notebooks with latest version.
More information here: https://databricks.com/blog/2017/11/08/introducing-command-line-interface-for-databricks-developers.html
Note, you first have to set up and configure databricks-cli on your machine: https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html#set-up-the-cli
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With