Getting "An error occurred while calling o58.csv" error while writing a spark dataframe into a csv file

Question

After using df.write.csv to try to export my spark dataframe into a csv file, I get the following error message:

 ~\AppData\Local\Programs\Python\Python39\lib\site-packages\py4j\protocol.py
 in get_return_value(answer, gateway_client, target_id, name

     324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
     325             if answer[1] == REFERENCE_TYPE:
     326                 raise Py4JJavaError(
     327                     "An error occurred while calling {0}{1}{2}.
".
     328                     format(target_id, ".", name), value)
 
 Py4JJavaError: An error occurred while calling o58.csv. :
 org.apache.spark.SparkException: Job aborted.

Any help is welcome, since I can't make sense of what's going on here and despite it being a seemingly straightfoward operation.

EDIT: Posting the whole code

from pyspark.sql.types import *
import pandasql as sqldf
import pyspark
from pyspark.sql import SparkSession
spark= SparkSession.builder.appName('SIAF').getOrCreate()
spark.conf.set('spark.sql.repl.eagerEval.enabled', True)
sc = spark.sparkContext
spark

spark_df=spark.read.csv(r'C:\Users\...\SIAF_2.csv',sep = ',', header=True, inferSchema=True)

df = spark_df.select(
    [
        "MENU",
        "NOM_SISTEMA",
        "DSC_GRP_USUARIO",
        "NOM_USUARIO",
        "NOM_FUNCIONARIO",
        "IND_ATIVO",
        "DAT_DESLIGAMENTO",
    ]
).where(
    (spark_df["MENU"].isNotNull())
    & (spark_df["IND_ATIVO"] == "S")
    & (spark_df["DAT_DESLIGAMENTO"].isNull())
).sort( 
    spark_df["MENU"], ascending=True)

df.show(5)

df.write.csv(
    "C:/Users/.../spark_test", mode="overwrite", sep=",", header=True
    )

tiagottmoraes · Accepted Answer

The issue was with the Java SDK (or JDK) version. Currently pyspark only supports JDK versions 8 and 11 (the most recent one is 17) To download the legacy versions of JDK, head to https://www.oracle.com/br/java/technologies/javase/jdk11-archive-downloads.html and download the version 11 (note: you will need to provide a valid e-mail and password to create an Oracle account)

Felipe Santos · Answer

I was with the same error. But, I found a topic and solve my problem. In my case, I made the download of winutils of the correct version in site: https://github.com/cdarlint/winutils in the folder bin I downloaded the hadoop.dll and put in the same path of winutils.exe For example "C:\Spark\spark-3.2.1-bin-hadoop3.2\bin"

Getting "An error occurred while calling o58.csv" error while writing a spark dataframe into a csv file

Tags:

python

dataframe

csv

pyspark

tiagottmoraes

2 Answers

tiagottmoraes

Felipe Santos

Recent Activity

Donate For Us

Getting "An error occurred while calling o58.csv" error while writing a spark dataframe into a csv file

Tags:

python

dataframe

csv

pyspark

tiagottmoraes

2 Answers

tiagottmoraes

Felipe Santos

Related questions

Recent Activity

Donate For Us