I'm trying to install Spark on my Mac. I've used home-brew to install spark 2.4.0 and Scala. I've installed PySpark in my anaconda environment and am using PyCharm for development. I've exported to my bash profile:
export SPARK_VERSION=`ls /usr/local/Cellar/apache-spark/ | sort | tail -1` export SPARK_HOME="/usr/local/Cellar/apache-spark/$SPARK_VERSION/libexec" export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
However I'm unable to get it to work.
I suspect this is due to java version from reading the traceback. I would really appreciate some help fixed the issue. Please comment if there is any information I could provide that is helpful beyond the traceback.
I am getting the following error:
Traceback (most recent call last): File "<input>", line 4, in <module> File "/anaconda3/envs/coda/lib/python3.6/site-packages/pyspark/rdd.py", line 816, in collect sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) File "/anaconda3/envs/coda/lib/python3.6/site-packages/py4j/java_gateway.py", line 1257, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/anaconda3/envs/coda/lib/python3.6/site-packages/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : java.lang.IllegalArgumentException: Unsupported class file major version 55
It means Java 17 (major version 61) compiled the class file, and if we try to run the class file under Java 16 and below environment, and we will hit Unsupported class file major version 61.
This should include JVMs on x86_64 and ARM64. It's easy to run locally on one machine — all you need is to have java installed on your system PATH , or the JAVA_HOME environment variable pointing to a Java installation. Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.
Edit Spark 3.0 supports Java 11, so you'll need to upgrade
Spark runs on Java 8/11, Scala 2.12, Python 2.7+/3.4+ and R 3.1+. Java 8 prior to version 8u92 support is deprecated as of Spark 3.0.0
Original answer
Until Spark supports Java 11, or higher (which would be hopefully be mentioned at the latest documentation when it is), you have to add in a flag to set your Java version to Java 8.
As of Spark 2.4.x
Spark runs on Java 8, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.4 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x)
On Mac/Unix, see asdf-java for installing different Javas
On a Mac, I am able to do this in my .bashrc
,
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
On Windows, checkout Chocolately, but seriously just use WSL2 or Docker to run Spark.
You can also set this in spark-env.sh
rather than set the variable for your whole profile.
And, of course, this all means you'll need to install Java 8 in addition to your existing Java 11
I ran into this issue when running Jupyter Notebook and Spark using Java 11. I installed and configured for Java 8 using the following steps.
Install Java 8:
$ sudo apt install openjdk-8-jdk
Since I had already installed Java 11, I then set my default Java to version 8 using:
$ sudo update-alternatives --config java
Select Java 8 and then confirm your changes:
$ java -version
Output should be similar to:
openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
I'm now able to run Spark successfully in Jupyter Notebook. The steps above were based on the following guide: https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-on-ubuntu-18-04
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With