I wanna run my Spark processes directly in my cluster using IntelliJ IDEA, so I'm following the next documentation https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html
After configuring all, I run databricks-connect test
but I'm not obtained the Scala REPL as the documentation says.
That is my cluster configuration
I solve the problem. The problem was the versions of all the tools:
Download and install Java SE Runtime Version 8.
Download and install Java SE Development Kit 8.
You can either download and install full blown Anaconda or use miniconda.
This pesty bugger is part of Hadoop and required by Spark to work on Windows. Quick install, open Powershell (as an admin) and run (if you are on a corporate network with funky security you may need to download the exe manually):
New-Item -Path "C:\Hadoop\Bin" -ItemType Directory -Force
Invoke-WebRequest -Uri https://github.com/steveloughran/winutils/raw/master/hadoop-2.7.1/bin/winutils.exe -OutFile "C:\Hadoop\Bin\winutils.exe"
[Environment]::SetEnvironmentVariable("HADOOP_HOME", "C:\Hadoop", "Machine")
We are now a new Virtual Environment. I recommend creating one environment per project you are working on. This allow us to install different versions of Databricks-Connect per project and upgrade them separately.
From the Start menu find the Anaconda Prompt. When it opens it will have a default prompt of something like:
(base) C:\Users\User The base part means you are not in a virtual environment, rather the base install. To create a new environment execute this:
conda create --name dbconnect python=3.5
Where dbconnect is the name of your environment and can be what you want. Databricks currently runs Python 3.5 - your Python version must match. Again this is another good reason for having an environment per project as this may change in the future.
Now activate the environment:
conda activate dbconnect
Install Databricks-Connect
You are now good to go:
pip install -U databricks-connect==5.3.*
databricks-connect configure
spark.databricks.service.server.enabled true
spark.databricks.service.port 15001 (Amazon 15001, Azure 8787)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With