"'JavaPackage' object is not callable" error executing explain() in Pyspark 3.0.1 via Zeppelin

Tags:

pyspark

I am running Pyspark 3.0.1 for Hadoop 2.7 in a Zeppelin notebook. In general all is well, however when I execute df.explain() on a DataFrame I get this error:

Fail to execute line 3: df.explain()
Traceback (most recent call last):
  File "/tmp/1610595392738-0/zeppelin_python.py", line 158, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 3, in <module>
  File "/usr/local/spark/python/pyspark/sql/dataframe.py", line 356, in explain
    print(self._sc._jvm.PythonSQLUtils.explainString(self._jdf.queryExecution(), explain_mode))
TypeError: 'JavaPackage' object is not callable

Has anyone come across and resolved this error before in the context of explain ?

My spark/jars folder contents:

activation-1.1.1.jar
aircompressor-0.10.jar
algebra_2.12-2.0.0-M2.jar
alluxio-2.4.1-client.jar
antlr4-runtime-4.7.1.jar
antlr-runtime-3.5.2.jar
aopalliance-1.0.jar
aopalliance-repackaged-2.6.1.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
arpack_combined_all-0.1.jar
arrow-format-0.15.1.jar
arrow-memory-0.15.1.jar
arrow-vector-0.15.1.jar
audience-annotations-0.5.0.jar
automaton-1.11-8.jar
avro-1.8.2.jar
avro-ipc-1.8.2.jar
avro-mapred-1.8.2-hadoop2.jar
bonecp-0.8.0.RELEASE.jar
breeze_2.12-1.0.jar
breeze-macros_2.12-1.0.jar
cats-kernel_2.12-2.0.0-M4.jar
chill_2.12-0.9.5.jar
chill-java-0.9.5.jar
commons-beanutils-1.9.4.jar
commons-cli-1.2.jar
commons-codec-1.10.jar
commons-collections-3.2.2.jar
commons-compiler-3.0.16.jar
commons-compress-1.8.1.jar
commons-configuration-1.6.jar
commons-crypto-1.0.0.jar
commons-dbcp-1.4.jar
commons-digester-1.8.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-lang3-3.9.jar
commons-logging-1.1.3.jar
commons-math3-3.4.1.jar
commons-net-3.1.jar
commons-pool-1.5.4.jar
commons-text-1.6.jar
compress-lzf-1.0.3.jar
core-1.1.2.jar
curator-client-2.7.1.jar
curator-framework-2.7.1.jar
curator-recipes-2.7.1.jar
datanucleus-api-jdo-4.2.4.jar
datanucleus-core-4.1.17.jar
datanucleus-rdbms-4.1.19.jar
derby-10.12.1.1.jar
dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
flatbuffers-java-1.9.0.jar
generex-1.0.2.jar
gson-2.2.4.jar
guava-14.0.1.jar
guice-3.0.jar
guice-servlet-3.0.jar
hadoop-annotations-2.7.4.jar
hadoop-auth-2.7.4.jar
hadoop-client-2.7.4.jar
hadoop-common-2.7.4.jar
hadoop-hdfs-2.7.4.jar
hadoop-mapreduce-client-app-2.7.4.jar
hadoop-mapreduce-client-common-2.7.4.jar
hadoop-mapreduce-client-core-2.7.4.jar
hadoop-mapreduce-client-jobclient-2.7.4.jar
hadoop-mapreduce-client-shuffle-2.7.4.jar
hadoop-yarn-api-2.7.4.jar
hadoop-yarn-client-2.7.4.jar
hadoop-yarn-common-2.7.4.jar
hadoop-yarn-server-common-2.7.4.jar
hadoop-yarn-server-web-proxy-2.7.4.jar
HikariCP-2.5.1.jar
hive-beeline-2.3.7.jar
hive-cli-2.3.7.jar
hive-common-2.3.7.jar
hive-exec-2.3.7-core.jar
hive-jdbc-2.3.7.jar
hive-llap-common-2.3.7.jar
hive-metastore-2.3.7.jar
hive-serde-1.2.1.spark2.jar
hive-serde-2.3.7.jar
hive-shims-0.23-2.3.7.jar
hive-shims-1.2.1.spark2.jar
hive-shims-2.3.7.jar
hive-shims-common-2.3.7.jar
hive-shims-scheduler-2.3.7.jar
hive-storage-api-2.7.1.jar
hive-vector-code-gen-2.3.7.jar
hk2-api-2.6.1.jar
hk2-locator-2.6.1.jar
hk2-utils-2.6.1.jar
htrace-core-3.1.0-incubating.jar
httpclient-4.5.6.jar
httpcore-4.4.12.jar
istack-commons-runtime-3.0.8.jar
ivy-2.4.0.jar
jackson-annotations-2.10.0.jar
jackson-core-2.10.0.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.10.0.jar
jackson-dataformat-yaml-2.10.0.jar
jackson-datatype-jsr310-2.10.3.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-jaxb-annotations-2.10.0.jar
jackson-module-paranamer-2.10.0.jar
jackson-module-scala_2.12-2.10.0.jar
jackson-xc-1.9.13.jar
jakarta.activation-api-1.2.1.jar
jakarta.annotation-api-1.3.5.jar
jakarta.inject-2.6.1.jar
jakarta.validation-api-2.0.2.jar
jakarta.ws.rs-api-2.1.6.jar
jakarta.xml.bind-api-2.3.2.jar
janino-3.0.16.jar
javassist-3.25.0-GA.jar
javax.inject-1.jar
javax.jdo-3.2.0-m3.jar
javax.servlet-api-3.1.0.jar
javolution-5.5.1.jar
jaxb-api-2.2.2.jar
jaxb-runtime-2.3.2.jar
jcl-over-slf4j-1.7.30.jar
jdo-api-3.0.1.jar
jersey-client-2.30.jar
jersey-common-2.30.jar
jersey-container-servlet-2.30.jar
jersey-container-servlet-core-2.30.jar
jersey-hk2-2.30.jar
jersey-media-jaxb-2.30.jar
jersey-server-2.30.jar
jetty-6.1.26.jar
jetty-sslengine-6.1.26.jar
jetty-util-6.1.26.jar
JLargeArrays-1.5.jar
jline-2.14.6.jar
joda-time-2.10.5.jar
jodd-core-3.5.2.jar
jpam-1.1.jar
json-1.8.jar
json4s-ast_2.12-3.6.6.jar
json4s-core_2.12-3.6.6.jar
json4s-jackson_2.12-3.6.6.jar
json4s-scalap_2.12-3.6.6.jar
jsp-api-2.1.jar
jsr305-3.0.0.jar
jta-1.1.jar
JTransforms-3.1.jar
jul-to-slf4j-1.7.30.jar
kryo-shaded-4.0.2.jar
kubernetes-client-4.9.2.jar
kubernetes-model-4.9.2.jar
kubernetes-model-common-4.9.2.jar
leveldbjni-all-1.8.jar
libfb303-0.9.3.jar
libthrift-0.12.0.jar
log4j-1.2.17.jar
logging-interceptor-3.12.6.jar
lz4-java-1.7.1.jar
machinist_2.12-0.6.8.jar
macro-compat_2.12-1.1.1.jar
mesos-1.4.0-shaded-protobuf.jar
metrics-core-4.1.1.jar
metrics-graphite-4.1.1.jar
metrics-jmx-4.1.1.jar
metrics-json-4.1.1.jar
metrics-jvm-4.1.1.jar
minlog-1.3.0.jar
netty-all-4.1.47.Final.jar
objenesis-2.5.1.jar
okhttp-3.12.6.jar
okio-1.15.0.jar
opencsv-2.3.jar
orc-core-1.5.10.jar
orc-mapreduce-1.5.10.jar
orc-shims-1.5.10.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.3.jar
paranamer-2.8.jar
parquet-column-1.10.1.jar
parquet-common-1.10.1.jar
parquet-encoding-1.10.1.jar
parquet-format-2.4.0.jar
parquet-hadoop-1.10.1.jar
parquet-jackson-1.10.1.jar
postgresql-42.2.14.jar
protobuf-java-2.5.0.jar
py4j-0.10.9.jar
pyrolite-4.30.jar
RoaringBitmap-0.7.45.jar
scala-collection-compat_2.12-2.1.1.jar
scala-compiler-2.12.10.jar
scala-library-2.12.10.jar
scala-parser-combinators_2.12-1.1.2.jar
scala-reflect-2.12.10.jar
scala-xml_2.12-1.2.0.jar
shapeless_2.12-2.3.3.jar
shims-0.7.45.jar
slf4j-api-1.7.30.jar
slf4j-log4j12-1.7.30.jar
snakeyaml-1.24.jar
snappy-java-1.1.7.5.jar
spark-catalyst_2.12-3.0.1.jar
spark-core_2.12-3.0.1.jar
spark-graphx_2.12-3.0.1.jar
spark-hive_2.12-3.0.1.jar
spark-hive-thriftserver_2.12-3.0.1.jar
spark-kubernetes_2.12-3.0.1.jar
spark-kvstore_2.12-3.0.1.jar
spark-launcher_2.12-3.0.1.jar
spark-mesos_2.12-3.0.1.jar
spark-mllib_2.12-3.0.1.jar
spark-mllib-local_2.12-3.0.1.jar
spark-network-common_2.12-3.0.1.jar
spark-network-shuffle_2.12-3.0.1.jar
spark-repl_2.12-3.0.1.jar
spark-sketch_2.12-3.0.1.jar
spark-sql_2.12-3.0.1.jar
spark-streaming_2.12-3.0.1.jar
spark-tags_2.12-3.0.1.jar
spark-tags_2.12-3.0.1-tests.jar
spark-unsafe_2.12-3.0.1.jar
spark-yarn_2.12-3.0.1.jar
spire_2.12-0.17.0-M1.jar
spire-macros_2.12-0.17.0-M1.jar
spire-platform_2.12-0.17.0-M1.jar
spire-util_2.12-0.17.0-M1.jar
ST4-4.0.4.jar
stax-api-1.0.1.jar
stax-api-1.0-2.jar
stream-2.9.6.jar
super-csv-2.2.0.jar
threeten-extra-1.5.0.jar
transaction-api-1.1.jar
univocity-parsers-2.9.0.jar
velocity-1.5.jar
xbean-asm7-shaded-4.15.jar
xercesImpl-2.12.0.jar
xml-apis-1.4.01.jar
xmlenc-0.52.jar
xz-1.5.jar
zjsonpatch-0.3.0.jar
zookeeper-3.4.14.jar
zstd-jni-1.4.4-3.jar

I gather the error is saying something might not be in my classpath but I cant think what that might be ...

427

asked Jan 14 '21 04:01

1 Answers

I ran into this same issue on AWS with EMR 6.2.0 (also Spark 3.0.1 coincidentally?) and jupyter notebooks. The issue appears to be related to how pyspark is initialized. Specifically, the py4j Java imports.

The following import is supposed to be executed while the notebook kernel is being initialized but seems to be skipped. You just need to run this once per session.

from py4j.java_gateway import java_import
java_import(spark._sc._jvm, "org.apache.spark.sql.api.python.*")

Now df.explain() works as expected.

For future reference - when you see 'JavaPackage' object is not callable, it often means that the target Java class was not found. Either the class doesn't exist or the expected import hasn't been called.

182

answered Nov 28 '22 12:11

Mike Park

Related questions
                            
                                Running steps of EMR in parallel
                            
                                How Spark handle data larger than cluster memory
                            
                                Dropping nested column of Dataframe with PySpark
                            
                                Best practice to create SparkSession object in Scala to use both in unittest and spark-submit
                            
                                Add months to date column in Spark dataframe
                            
                                What does "pre-built for Apache Hadoop 2.7 and later" mean?
                            
                                How can I obtain the DAG of an Apache Spark job without running it?
                            
                                Why is no map function for dataframe in pyspark while the spark equivalent has it?
                            
                                How to set spark.driver.memory for Spark/Zeppelin on EMR
                            
                                Is there a way to validate the syntax of raw spark sql query?
                            
                                java.lang.UnsupportedOperationExceptionfieldIndex on a Row without schema is undefined: Exception on row.getAs[String]
                            
                                How to select multiple columns of dataset, given a list of column names?
                            
                                Spark decimal type precision loss
                            
                                Comparison of a `float` to `np.nan` in Spark Dataframe
                            
                                How do I get a spark dataframe to print it's explain plan to a string
                            
                                How to find the max String length of a column in Spark using dataframe?
                            
                                Spark: How to aggregate/reduce records based on time difference?
                            
                                Reading Excel (.xlsx) file in pyspark
                            
                                What is the optimal way to read from multiple Kafka topics and write to different sinks using Spark Structured Streaming?
                            
                                Elasticsearch for spark 3.0

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

"'JavaPackage' object is not callable" error executing explain() in Pyspark 3.0.1 via Zeppelin

Tags:

apache-spark

pyspark

Phil

People also ask

1 Answers

Mike Park

Recent Activity

Donate For Us