When I try to execute this command line at pyspark
arquivo = sc.textFile("dataset_analise_sentimento.csv")
I got the following error message:
Py4JJavaError: An error occurred while calling z:
org.apache.spark.api.python.PythonRDD.runJob.:
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 0.0 failed 1 times, most recent failure:
Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver):
org.apache.spark.SparkException: Python worker failed to connect back.
I have tried the following steps:
sc = spark.sparkContext
(found this possible solution at this question here in Stackoverflow, didn´t work for me).PYSPARK_DRIVER_PYTHON
from jupyter
to ipython
, as said in this link, no success.None of the steps above worked for me and I can´t find a solution.
Actually I´m using the following versions:
Python 3.7.3, Java JDK 11.0.6, Windows 10, Apache Spark 2.3.4
I just configure the following variables environment and now it's working normally:
HADOOP_HOME = C:\Hadoop
JAVA_HOME = C:\Java\jdk-11.0.6
PYSPARK_DRIVER_PYTHON = jupyter
PYSPARK_DRIVER_PYTHON_OPTS = notebook
PYSPARK_PYTHON = python
Actually I´m using the following versions:
Python 3.7.3, Java JDK 11.0.6, Windows 10, Apache Spark 2.4.3 and using Jupyter Notebook with pyspark.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With