I have used Domino Data Lab for a while and I was able to start a Python or R session with a single machine, without using Spark.
Is is possible to do the same with Azure Databricks? That is, to start a notebook session with Python without Spark (and a cluster)?
An Azure Databricks cluster is a set of computation resources and configurations on which you run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning. You run these workloads as a set of commands in a notebook or as an automated job.
Perform the following tasks to create a notebook in Databricks, configure the notebook to read data from an Azure Open Datasets, and then run a Spark SQL job on the data. In the left pane, select Azure Databricks.
The data analytics workload is not automated. For example, commands within Azure Databricks notebooks run on Apache Spark clusters until they are manually terminated. Multiple users can share a cluster to analyse it collaboratively.
Run a Spark SQL job Perform the following tasks to create a notebook in Databricks, configure the notebook to read data from an Azure Open Datasets, and then run a Spark SQL job on the data. In the left pane, select Azure Databricks. From the Common Tasks, select New Notebook.
You always have to have a "cluster", but it can be a single node cluster (only with a driver node). Set the number of worker nodes to zero for this configuration. You are not able to run Spark on a driver-only cluster. See the following example using MXNet on a driver-only cluster.
Another option that is available with September 2020 platform release is Single Node Cluster. You can select "Single-Node" from the Cluster Mode to create a single-node cluster with Spark running in local mode.
Databricks now has single-node clusters out in public preview.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With