Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SparkException: Python worker failed to connect back when execute spark action

Tags:

When I try to execute this command line at pyspark

arquivo = sc.textFile("dataset_analise_sentimento.csv")

I got the following error message:

Py4JJavaError: An error occurred while calling z:
org.apache.spark.api.python.PythonRDD.runJob.: 
org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 0.0 failed 1 times, most recent failure:
Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver):
org.apache.spark.SparkException: Python worker failed to connect back.

I have tried the following steps:

  • Check environment variables.
  • Check Apache Spark installation on Windows 10 steps.
  • Use different versions of Apache Spark (tried 2.4.3 / 2.4.2 / 2.3.4).
  • Disable firewall windows and antivirus that I have installed.
  • Tried to initialize the SparkContext manually with sc = spark.sparkContext (found this possible solution at this question here in Stackoverflow, didn´t work for me).
  • Tried to change the value of PYSPARK_DRIVER_PYTHON from jupyter to ipython, as said in this link, no success.

None of the steps above worked for me and I can´t find a solution.

Actually I´m using the following versions:

Python 3.7.3, Java JDK 11.0.6, Windows 10, Apache Spark 2.3.4

like image 277
Henrique Branco Avatar asked Mar 29 '20 15:03

Henrique Branco


1 Answers

I just configure the following variables environment and now it's working normally:

  • HADOOP_HOME = C:\Hadoop
  • JAVA_HOME = C:\Java\jdk-11.0.6
  • PYSPARK_DRIVER_PYTHON = jupyter
  • PYSPARK_DRIVER_PYTHON_OPTS = notebook
  • PYSPARK_PYTHON = python

Actually I´m using the following versions:

Python 3.7.3, Java JDK 11.0.6, Windows 10, Apache Spark 2.4.3 and using Jupyter Notebook with pyspark.

like image 169
Henrique Branco Avatar answered Oct 05 '22 15:10

Henrique Branco