Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to execute Spark code locally with databricks-connect?

Is there a way to execute Spark code locally with databricks-connect?

The reason is that I would like to execute some tests as part of my CI/CD pipeline without the need to have a cluster up and running.

like image 711
flappy Avatar asked Jul 24 '19 15:07

flappy


People also ask

Can Databricks run locally?

Unfortunately, Local instance of databricks is not available. Only way to use Databricks is via cloud only. Databricks is available from Microsoft and AWS . If you want to test databricks, you can use Databricks community Edition which is free of cost.

How do I run Spark in Databricks?

Run a Spark SQL jobIn the left pane, select Azure Databricks. From the Common Tasks, select New Notebook. In the Create Notebook dialog box, enter a name, select Python as the language, and select the Spark cluster that you created earlier. Select Create.

How do I run code in Databricks?

To run a shell command on all nodes, use an init script. %fs : Allows you to use dbutils filesystem commands. For example, to run the dbutils.fs.ls command to list files, you can specify %fs ls instead. For more information, see How to work with files on Azure Databricks.


1 Answers

No, databricks-connect requires a running cluster. If you do not use any databricks specific code (like dbutils) you can run spark locally and execute against that - assuming you can still access the data sources you need.

like image 139
simon_dmorias Avatar answered Sep 27 '22 22:09

simon_dmorias