Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Databricks-GitHub integration, automatically add all notebooks to repository

I'm trying to set up GitHub integration for Databricks.
We have hundreds of notebooks there, and it would be exhausting to add every notebook manually to the repo.

Is there some way to automatically commit and push all notebooks from databricks to repository?

like image 612
Viacheslav Shalamov Avatar asked Nov 06 '18 09:11

Viacheslav Shalamov


People also ask

How do I Clone a repository from GitHub to Databricks?

Clone a remote Git repository In the Add Repo dialog, click Clone remote Git repo and enter the repository URL. Select your Git provider from the drop-down menu, optionally change the name to use for the Databricks repo, and click Create. The contents of the remote repository are cloned to the Databricks repo.


1 Answers

Since no one answered it for 3 month, I'll put my own solution.

Under /Shared/ dir in databricks we have notebooks which should be synced to repository under notebooks/Shared/.
I run this script on regular basis, thus keeping all notebooks up-to-date in a repo.

databricks workspace export_dir /Shared ./notebooks/Shared -o
git add --all
git commit -m "shared notebooks updated"
git push

-o flag is for overriding existing notebooks with latest version.

More information here: https://databricks.com/blog/2017/11/08/introducing-command-line-interface-for-databricks-developers.html

Note, you first have to set up and configure databricks-cli on your machine: https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html#set-up-the-cli

like image 82
Viacheslav Shalamov Avatar answered Sep 20 '22 07:09

Viacheslav Shalamov