Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run Azure Databricks without Spark cluster

I have used Domino Data Lab for a while and I was able to start a Python or R session with a single machine, without using Spark.

Is is possible to do the same with Azure Databricks? That is, to start a notebook session with Python without Spark (and a cluster)?

like image 826
Bruno Ferreira Avatar asked Oct 11 '18 17:10

Bruno Ferreira


People also ask

What is an azure Databricks cluster?

An Azure Databricks cluster is a set of computation resources and configurations on which you run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning. You run these workloads as a set of commands in a notebook or as an automated job.

How do I create a notebook in azure Databricks?

Perform the following tasks to create a notebook in Databricks, configure the notebook to read data from an Azure Open Datasets, and then run a Spark SQL job on the data. In the left pane, select Azure Databricks.

Is the data analytics workload automated in azure Databricks?

The data analytics workload is not automated. For example, commands within Azure Databricks notebooks run on Apache Spark clusters until they are manually terminated. Multiple users can share a cluster to analyse it collaboratively.

How do I run a Spark SQL job in Databricks?

Run a Spark SQL job Perform the following tasks to create a notebook in Databricks, configure the notebook to read data from an Azure Open Datasets, and then run a Spark SQL job on the data. In the left pane, select Azure Databricks. From the Common Tasks, select New Notebook.


2 Answers

You always have to have a "cluster", but it can be a single node cluster (only with a driver node). Set the number of worker nodes to zero for this configuration. You are not able to run Spark on a driver-only cluster. See the following example using MXNet on a driver-only cluster.

Another option that is available with September 2020 platform release is Single Node Cluster. You can select "Single-Node" from the Cluster Mode to create a single-node cluster with Spark running in local mode.

like image 69
Hauke Mallow Avatar answered Oct 05 '22 00:10

Hauke Mallow


Databricks now has single-node clusters out in public preview.

like image 24
malthe Avatar answered Oct 05 '22 01:10

malthe