Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I configure Azure Databricks to use VSTS for Source Control

I have recently began using Azure Databricks and comparing to Jupyter Notebooks running on HDInsight. I have searched around and read documentation trying to learn how to have ADBricks use VSTS git for source control. However, I have not found a solution that works.

I have found instructions for using other git providers, but I want to be clear that is not an option for this use-case so please refrain from those types of responses.

HDInsight has similar limitations, but I could work around via ssh/rsync, and that was fine because I was deploying to the remote server, same way a build would, and able to blue/green deployments and the like same way a build would do.

For ADBricks, the cluster-on-demand is amazing, but there is an assumption that you're developing in Notebooks "on the cluster" and effectively you're in Continuous Delivery mode. This is just fine with me (except for the less-than-adequate, high-latency notebook development), but I still need to automate getting code to VSTS periodically to save state/backup like a good coder should :).

like image 659
jatal Avatar asked Mar 06 '23 02:03

jatal


1 Answers

Typically for full CI/CD in Azure Databricks we use the workspace API to pull and push whole notebooks or directories from Databricks to a user's local machine or a build server. https://docs.azuredatabricks.net/api/latest/workspace.html

Databricks also has a CLI that leverages the workspace API for easier, higher-level commands: https://docs.azuredatabricks.net/user-guide/dev-tools/databricks-cli.html

The workflow for this looks something like this: enter image description here

Here is a blog from Databricks that goes into more detail:https://databricks.com/blog/2017/10/30/continuous-integration-continuous-delivery-databricks.html

like image 110
shoeboxer Avatar answered Apr 29 '23 01:04

shoeboxer