Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why "databricks-connect test" does not work after configurate Databricks Connect?

I wanna run my Spark processes directly in my cluster using IntelliJ IDEA, so I'm following the next documentation https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html

After configuring all, I run databricks-connect test but I'm not obtained the Scala REPL as the documentation says.

enter image description here

That is my cluster configuration

enter image description here

like image 385
Eric Bellet Avatar asked Dec 13 '22 12:12

Eric Bellet


1 Answers

I solve the problem. The problem was the versions of all the tools:

  • Install Java

Download and install Java SE Runtime Version 8.

Download and install Java SE Development Kit 8.

  • Install Conda

You can either download and install full blown Anaconda or use miniconda.

  • Download WinUtils

This pesty bugger is part of Hadoop and required by Spark to work on Windows. Quick install, open Powershell (as an admin) and run (if you are on a corporate network with funky security you may need to download the exe manually):

New-Item -Path "C:\Hadoop\Bin" -ItemType Directory -Force
Invoke-WebRequest -Uri https://github.com/steveloughran/winutils/raw/master/hadoop-2.7.1/bin/winutils.exe -OutFile "C:\Hadoop\Bin\winutils.exe"
[Environment]::SetEnvironmentVariable("HADOOP_HOME", "C:\Hadoop", "Machine")
  • Create Virtual Environment

We are now a new Virtual Environment. I recommend creating one environment per project you are working on. This allow us to install different versions of Databricks-Connect per project and upgrade them separately.

From the Start menu find the Anaconda Prompt. When it opens it will have a default prompt of something like:

(base) C:\Users\User The base part means you are not in a virtual environment, rather the base install. To create a new environment execute this:

conda create --name dbconnect python=3.5

Where dbconnect is the name of your environment and can be what you want. Databricks currently runs Python 3.5 - your Python version must match. Again this is another good reason for having an environment per project as this may change in the future.

  • Now activate the environment:

    conda activate dbconnect

  • Install Databricks-Connect

You are now good to go:

pip install -U databricks-connect==5.3.*

databricks-connect configure

enter image description here

  • Create Databricks cluster (in this case I used Amazon Web Services)

enter image description here

spark.databricks.service.server.enabled true
spark.databricks.service.port 15001 (Amazon 15001, Azure 8787)
  • Turn Windows Defender Firewall Off or allow access.
like image 115
Eric Bellet Avatar answered Jan 17 '23 10:01

Eric Bellet