Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Method showString([class java.lang.Integer, class java.lang.Integer, class java.lang.Boolean]) does not exist in PySpark

This is the snippet:

from pyspark import SparkContext
from pyspark.sql.session import SparkSession

sc = SparkContext()
spark = SparkSession(sc)
d = spark.read.format("csv").option("header", True).option("inferSchema", True).load('file.csv')
d.show()

After this runs into the error:

An error occurred while calling o163.showString. Trace:
py4j.Py4JException: Method showString([class java.lang.Integer, class java.lang.Integer, class java.lang.Boolean]) does not exist

All the other methods work well. Tried researching alot but in vain. Any lead will be highly appreciated

like image 502
Trupti J Avatar asked Dec 23 '22 02:12

Trupti J


1 Answers

This is an indicator of a Spark version mismatch. Before Spark 2.3 show method took only two arguments:

def show(self, n=20, truncate=True):

since 2.3 it takes three arguments:

def show(self, n=20, truncate=True, vertical=False):

In your case Python client seems to invoke the latter one, while the JVM backend uses the older version.

Since SparkContext initialization undergone significant changes in 2.4, which would cause failure on SparkContext.__init__, you're likely using:

  • 2.3.x Python library.
  • 2.2.x JARs.

You can confirm that by checking versions directly from your session, Python:

sc.version

vs. JVM:

sc._jsc.version()

Problems like this, are usually a result of misconfigured PYTHONPATH (either directly, or by using pip installed PySpark on top per-existing Spark binaries) or SPARK_HOME.

like image 190
10465355 Avatar answered Jan 03 '23 05:01

10465355