Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

key not found: _PYSPARK_DRIVER_CALLBACK_HOST

I'm trying to run this code:

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder \
        .master("local") \
        .appName("Word Count") \
        .getOrCreate()

df = spark.createDataFrame([
    (1, 144.5, 5.9, 33, 'M'),
    (2, 167.2, 5.4, 45, 'M'),
    (3, 124.1, 5.2, 23, 'F'),
    (4, 144.5, 5.9, 33, 'M'),
    (5, 133.2, 5.7, 54, 'F'),
    (3, 124.1, 5.2, 23, 'F'),
    (5, 129.2, 5.3, 42, 'M'),
   ], ['id', 'weight', 'height', 'age', 'gender'])

df.show()
print('Count of Rows: {0}'.format(df.count()))
print('Count of distinct Rows: {0}'.format((df.distinct().count())))

spark.stop()

And getting an error

18/06/22 11:58:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CALLBACK_HOST
    ...
Exception: Java gateway process exited before sending its port number

I'm using PyCharm and MacOS, Python 3.6, Spark 2.3.1

What is the possible reason of this error?

like image 765
bboy Avatar asked Jun 22 '18 12:06

bboy


3 Answers

I got the same error key not found: _PYSPARK_DRIVER_CALLBACK_HOST while upgrading to Spark 3.1.1.

What worked for me was upgrading pyspark via pip install pyspark==3.1.1, installing findspark, and then running the following lines before starting the SparkSession:

import findspark
findspark.init()
like image 110
Camila A. González Williamson Avatar answered Oct 10 '22 21:10

Camila A. González Williamson


This error is a result of a version mismatch. Environment variable which is referenced in the traceback (_PYSPARK_DRIVER_CALLBACK_HOST) has been removed during update Py4j dependency to 0.10.7 and backported to 2.3 branch in 2.3.1.

Considering version information:

I'm using PyCharm and MacOS, Python 3.6, Spark 2.3.1

it looks like you have 2.3.1 package installed, but SPARK_HOME points to an older (2.3.0 or earlier) installation.

like image 16
Alper t. Turker Avatar answered Nov 04 '22 11:11

Alper t. Turker


This resolution that I'm about to render also takes care of the "key not found: _PYSPARK_DRIVER_CALLBACK_HOST/Java Gateway/PySpark 2.3.1" error!! Add to your bashrc or /etc/environment or /etc/profile

export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH

That should do the doobie right there. You may thank me in advance. #thumbsup :)

like image 13
SCOTT McNEIL Avatar answered Nov 04 '22 11:11

SCOTT McNEIL