Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Py4J error when creating a spark dataframe using pyspark

I have installed pyspark with python 3.6 and I am using jupyter notebook to initialize a spark session.

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("test").enableHieSupport.getOrCreate()

which runs without any errors

But I write,

df = spark.range(10)
df.show()

It throws me an error -->

Py4JError: An error occurred while calling o54.showString. Trace:
py4j.Py4JException: Method showString([class java.lang.Integer, class java.lang.Integer, class java.lang.Boolean]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
    at py4j.Gateway.invoke(Gateway.java:272)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:745)

I dont know why I am facing this issue.

If I do,

from pyspark import SparkContext
sc = SparkContext()
print(sc.version)

'2.1.0'
like image 930
Regressor Avatar asked Mar 02 '18 04:03

Regressor


People also ask

What is Py4J PySpark?

Py4J is a Java library that is integrated within PySpark and allows python to dynamically interface with JVM objects. so Py4J is a mandatory module to run the PySpark application and it is located at $SPARK_HOME/python/lib/py4j-*-src.

Is PySpark sample random?

PySpark sampling ( pyspark. sql. DataFrame. sample() ) is a mechanism to get random sample records from the dataset, this is helpful when you have a larger dataset and wanted to analyze/test a subset of the data for example 10% of the original file.

Is hail utils package does not exist in the JVM?

getEncryptionEnabled does not exist in the JVM ” due to Spark environemnt variables are not set right. Check if you have your environment variables set right on . <strong>bashrc</strong> file. For Unix and Mac, the variable should be something like below.

Do you need Java for PySpark?

PySpark requires Java version 7 or later and Python version 2.6 or later.


1 Answers

For me

import findspark
findspark.init()

import pyspark

solved the problem

like image 114
zinyosrim Avatar answered Sep 25 '22 23:09

zinyosrim