Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Databricks Connect

Tags:

I'd like to edit Databricks notebooks locally using my favorite editor, and then use Databricks Connect to run the notebook remotely on a Databricks cluster that I usually access via the web interface.

Unfortunately, after searching the web for a couple days, I can't find detailed documentation on Databricks Connect.

I run databricks-connect configure, as suggested on the PyPI page above, but I'm not sure what some of the settings are. Could someone please walk me through this (like where to find these values in the web interface) or provide a link to proper documentation?

I know what some of the settings should be, but I'll include everything that comes up when running databricks-connect configure, for completeness and benefit of others.

Databricks Host
Databricks Token
Cluster ID (e.g., 0921-001415-jelly628)
Org ID (Azure-only, see ?o=orgId in URL)
Port (is it spark.databricks.service.port ?)

Also, and I think it's what I'm most interested in, do I need to make any changes in the notebook itself, such as define SparkContext or something? If so, with what configuration?

And how should I run it? After running databricks-connect configure, there doesn't seem any "magic" to be happening. When I run jupyter notebook, it still runs locally and doesn't seem to know to forward it to a remote cluster.

Update: If you'd like to think of something more concrete, in Databricks' web interface, dbutils is a predefined object. How do I refer to it when running a notebook remotely?

like image 824
Arseny Avatar asked Mar 05 '19 18:03

Arseny


2 Answers

I had marked another person's reply as the answer, but that reply is gone now for some reason.

For my purposes, the official user guide worked: https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html

like image 94
Arseny Avatar answered Dec 21 '22 12:12

Arseny


In short you will need to include:

spark = SparkSession.builder.getOrCreate()

At the start of scripts. Notebooks should convert, but of course magic commands (%run etc) will not work.

More detail is available here on the parts that will not work. https://datathirst.net/blog/2019/3/7/databricks-connect-finally

like image 37
simon_dmorias Avatar answered Dec 21 '22 11:12

simon_dmorias